Video storytelling is often constrained by available material, limiting creative expression and leaving undesired narrative gaps. Generative video offers a new way to address these limitations by augmenting captured media with tailored visuals. To explore this potential, we interviewed eight video creators to identify opportunities and challenges in integrating generative video into their workflows. Building on these insights and established filmmaking principles, we developed Vidmento, a tool for authoring hybrid video stories that combine captured and generated media through context-aware expansion. Vidmento surfaces opportunities for story development, generates clips that blend stylistically and narratively with surrounding media, and provides controls for refinement. In a study with 12 creators, Vidmento supported narrative development and exploration by systematically expanding initial materials with generative media, enabling expressive video storytelling aligned with creative intent. We highlight how creators bridge story gaps with generative content and where they find this blending capability most valuable.
Sound effects (SFX) are critical to video storytelling by immersing viewers, directing attention, and shaping emotion. However, crafting an effective soundscape is difficult: creators must decidehow to source, place, layer, and mix sounds to support the narrative. Generative text-to-SFX tools enable users to create custom sounds, but creators often struggle to describe sounds with words and lack control over individual stems in premixed outputs. We propose SoundStager, an AI-assisted tool for designing generative soundscapes for video. SoundStager analyzes the video narrativeto create layered audio scenes (of keynote, signal, soundmark, and archetypal sounds) and supports iterative refinement through a combination of conversational and analog controls. SoundStager’s design was informed by formative studies with six professional sound designers, six video creators, and insights from sound design literature. Our user evaluation with twelve video creators shows that SoundStager enables users to quickly create satisfactory soundscapes while retaining creative control.
Humans think visually—we remember in images, dream in pictures, and use visual metaphors to communicate. Yet, most creative writing tools remain text-centric, limiting how writers plan and translate ideas. We present Vistoria, a system for synchronized image-text co-editing in fictional story writing. A formative Wizard-of-Oz co-design study with 10 story writers revealed how sketches, images, and text serve as essential elements for ideation and organization. Drawing on theories of Instrumental Interaction, Vistoria introduces instrumental operations—Lasso, Collage, Perspective Shift, and Filter that enable seamless narrative exploration across modalities. A controlled study with 12 participants shows that co-editing enhances expressiveness, immersion, and collaboration, opening space for writers to follow divergent story directions and craft more vivid, detailed narratives. While multimodality increased cognitive demand, participants reported stronger senses of ownership and agency. These findings demonstrate how multimodal co-editing expands creative potential by balancing abstraction and concreteness in narrative development.
Motion graphics, which bring logos, text, and other illustrations to life, are greatly enhanced with sound effects. Sound design for motion graphics presents unique challenges due to their short, abstract nature. Sound designers must identify opportunities for adding sound, decide on the sound's character to match the visual graphics, synchronize sounds with events, and align sonic properties with motions. We introduce MoSound, an interactive system that helps with all steps of this creation process. We designed the interface of MoSound based on formative studies with practitioners and implemented the system as a combination of visual event detection, spatial attribute mapping, and generative sound stylization. We demonstrate MoSound on a variety of examples, showing that it is capable of creating high quality soundtracks while being accessible to novices.
Creativity support tools (CSTs) increasingly include image-generation features. The underlying diffusion models enact a particular image diffusing process that AI CSTs tend to obscure within a black-box. Artists’ creative control is limited to indirect manipulation (prompting), chaining these "black-boxes" together, or using ML-engineering skills to build custom black-boxes. Seeking to maintain the low-threshold offered by prompting, while raising the ceiling of expressive interactions, we built Noise Pilot: a multi-layered approach to supporting diffusion-based creative processes at three levels of depth. We used Noise Pilot as a probe to study the artistic processes of 9 artists over a 2-week period. Artists engaged with diffusion at different levels of manipulative depth and crafted reusable artifacts to enact bespoke diffusion processes; some produced results impossible to achieve with prompting alone. We discuss how black-box AIs in CSTs limit creative power, and propose subverting this by favoring visibility over obscurity, and materiality over personification.
In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before full-scale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for filmmakers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.
Creativity support tools have begun to incorporate GenAI for exploring ideas. However, our preliminary study with nine designers showed that current GenAI tools lack explicit support for iteratively evolving, reflecting upon and tracking design alternatives. We developed DesignTrace, an early-stage GenAI design tool that allows designers to experiment with semantically relevant visual variations in an interactive design space. Its representation captures the progression of designers’ visual and semantic ideas through command histories, state tracking, and an interactive branching structure. A study of twelve professional designers shows that DesignTrace’s palette helps express, explore, and reflect on design intentions. Its interactive branching structure helps them maintain visual consistency across design iterations; remember and revisit earlier design decisions; and see connections across ideas. Our work shows how re-envisioning GenAI-based interfaces around explicit design traces enable designers to benefit from generative capabilities while maintaining control as they explore design variants.