Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address these challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow, allowing users to create visually appealing images on their first attempt while requiring significantly less cognitive load than a widely-used baseline tool.
https://doi.org/10.1145/3586183.3606725
Pose-aware visual effects where graphics assets and animations are rendered reactively to the human pose have become increasingly popular, appearing on mobile devices, the web, or even head-mounted displays like AR glasses. Yet, creating such effects still remains difficult for novices. In a traditional video editing workflow, a creator could utilize keyframes to create expressive but non-adaptive results which cannot be reused for other videos. Alternatively, programming-based approaches allow users to develop interactive effects, but are cumbersome for users to quickly express their creative intents. In this work, we propose a lightweight visual programming workflow for authoring adaptive and expressive pose effects. By combining a programming by demonstration paradigm with visual programming, we simplify three key tasks in the authoring process: creating pose triggers, designing animation parameters, and rendering. We evaluated our system with a qualitative user study and a replicated example study, finding that all participants can create effects efficiently.
https://doi.org/10.1145/3586183.3606788
With appealing visual effects, kinetic typography (animated text) has prevailed in movies, advertisements, and social media. However, it remains challenging and time-consuming to craft its animation scheme. We propose an automatic framework to transfer the animation scheme of a rigid body on a given meme GIF to text in vector format. First, the trajectories of key points on the GIF anchor are extracted and mapped to the text's control points based on local affine transformation. Then the temporal positions of the control points are optimized to maintain the text topology. We also develop an authoring tool that allows intuitive human control in the generation process. A questionnaire study provides evidence that the output results are aesthetically pleasing and well preserve the animation patterns in the original GIF, where participants were impressed by a similar emotional semantics of the original GIF. In addition, we evaluate the utility and effectiveness of our approach through a workshop with general users and designers.
https://doi.org/10.1145/3586183.3606813
We introduce an end-to-end interactive system for mental face reconstruction - the challenging task of visually reconstructing a face image a person only has in their mind. In contrast to existing methods that suffer from low usability and high mental load, our approach only requires the user to rank images over multiple iterations according to the perceived similarity with their mental image. Based on these rankings, our mental face reconstruction system extracts image features in each iteration, combines them into a joint feature vector, and then uses a generative model to visually reconstruct the mental image. To avoid the need for collecting large amounts of human training data, we further propose a computational user model that can simulate human ranking behaviour using data from an online crowd-sourcing study (N=215). Results from a 12-participant user study show that our method can reconstruct mental images that are visually similar to existing approaches but has significantly higher usability, lower perceived workload, and is 40% faster. In addition, results from a third 22-participant lineup study in which we validated our reconstructions on a face ranking task show a identification rate of 55.3%, which is in line with prior work. These results represent an important step towards new interactive intelligent systems that can robustly and effortlessly reconstruct a user's mental image.
https://doi.org/10.1145/3586183.3606795
Creative coding tasks are often exploratory in nature. When producing digital artwork, artists usually begin with a high-level semantic construct such as a “stained glass filter” and programmatically implement it by varying code parameters such as shape, color, lines, and opacity to produce visually appealing results. Based on interviews with artists, it can be effortful to translate semantic constructs to program syntax, and current programming tools don’t lend well to rapid creative exploration. To address these challenges, we introduce Spellburst, a large language model (LLM) powered creative-coding environment. Spellburst provides (1) a node-based interface that allows artists to create generative art and explore variations through branching and merging operations, (2) expressive prompt-based interactions to engage in semantic programming, and (3) dynamic prompt-driven interfaces and direct code editing to seamlessly switch between semantic and syntactic exploration. Our evaluation with artists demonstrates Spellburst’s potential to enhance creative coding practices and inform the design of computational creativity tools that bridge semantic and syntactic spaces.
https://doi.org/10.1145/3586183.3606719
Color filters are ubiquitous across visual digital media due to their transformative effect. However, it can be difficult to understand how a color filter will affect an image, especially for novices. In order to become experts, we argue that novices need to develop Goodwin’s notion of Professional Vision. Then, they can "see" and interpret their work in terms of their domain knowledge like experts. Using the theory of Professional Vision, we present two design objectives for systems that aim to help users develop expertise. These goals were used to develop Color Field, an interactive visualization of color filters as a vector field over the Hue-Saturation-Lightness color space. We conducted an exploratory user study in which five color grading novices and four experts were asked to analyze color filters. We found that Color Field enabled multiple strategies to make sense of filters (e.g. reviewing the overall shape of the vector field) and discuss them (e.g. using spatial language). We conclude with other applications of Color Field and future work to leverages Professional Vision in HCI.
https://doi.org/10.1145/3586183.3606828