People speak aloud to externalize thoughts as one way to help clarify and organize them. Although Speech-to-text can capture these thoughts, transcripts can be difficult to read and make sense due to disfluencies, repetitions and potential disorganization. To support thinking through verbalization, we introduce ORALITY, which extracts key information from spoken content, performs semantic analysis through LLMs to form a node-link diagram in an interactive canvas. Instead of reading and working with transcripts, users could manipulate clusters of nodes and give verbal instructions to re-extract and organize the content in other ways. It also provides AI-generated inspirational questions and detection of logical conflicts. We conducted a lab study with twelve participants comparing ORALITY against speech interaction with ChatGPT. We found that ORALITY can better support users in clarifying and developing their thoughts. The findings also identified the affordances of both graphical and conversational thought clarification tools and derived design implications.
Web-based activities span multiple webpages. However, conventional browsers with stacks of tabs cannot support operating and synthesizing large volumes of information across pages. While recent AI systems enable fully automated web browsing and information synthesis, they often diminish user agency and hinder contextual understanding. We explore how AI could instead augment user interactions with content across webpages and mitigate cognitive and manual efforts. Through literature on information tasks and web browsing challenges, and an iterative design process, we present novel interactions with our prototype web browser, Orca. Leveraging AI, Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale. To enable browsing at scale, webpages are treated as malleable materials that humans and AI can collaboratively manipulate and compose into a malleable, dynamic, and browser-level workspace. Our evaluation revealed an increased "appetite" for information foraging, enhanced control, and more flexible sensemaking across a broader web information landscape.
Sketching provides an intuitive way to convey dynamic intent in animation authoring (i.e., how elements change over time and space), making it a natural medium for automatic content creation. Yet existing approaches often constrain sketches to fixed command tokens or predefined visual forms, overlooking their free-form nature and the central role of humans in shaping intention. To address this, we introduce an interaction paradigm where users convey dynamic intent to a vision–language model via free-form sketching, instantiated here in a sketch storyboard to motion graphics workflow. We implement an interface and improve it through a three-stage study with 24 participants. The study shows how sketches convey motion with minimal input, how their inherent ambiguity requires users to be involved for clarification, and how sketches can visually guide video refinement. Our findings reveal the potential of sketch–AI interaction to bridge the gap between intention and outcome, and demonstrate its applicability to 3D animation and video generation.
Hand–Object Interaction (HOI) is a key interaction component in Virtual Reality (VR). However, designing HOI still requires manual efforts to decide how object should be selected and manipulated, while also considering user abilities, which leads to time-consuming refinements. We present HOICraft, a VLM-based in-situ HOI authoring tool that enables part-level interaction design in VR. Here, HOICraft assists designers by recommending interactable elements from 3D objects, customizing HOI design properties, and mapping hand movement with virtual object behavior. We conducted a formative study with three expert VR designers to identify five representative HOI designs to support diverse user experiences. Building upon preference data from 20 participants, we develop an HOI mapping module with in-context learning. In a user study with 12 VR interaction designers, HOI mapping from HOICraft significantly reduced trial-and-error iterations compared to manual authoring. Finally, we assessed the usability of HOICraft, demonstrating its effectiveness for HOI design in VR.
Programmers formulate prompts for code generation based on their understandings of problem domain and LLM ability to execute instructions. A deficiency in either understanding yields inadequate generated code, requiring programmers to revise their understandings based on deficiencies observed in the generated code.
We propose ReFiQ (``Result-first Queries''), a tool that simultaneously offers concrete run-time execution results of multiple generated code variations for one prompt. In an exploratory user study (n=8), we observed that programmers systematically compared those results to identify issues in their understandings, deliberately formulated single prompts to explore a range of options of a domain concept, and were more often encouraged to make a deliberate decision informed by their observations instead of directly applying generated code. Participants in our study required fewer iterations to arrive at a satisfying solution in ReFiQ than in our baseline, GitHub Copilot.
Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to capture synergistic designs between prompts and interactions, hindering their comparison and innovation. To fill this gap, via an iterative and deductive process, we develop the Interaction-Augmented Instruction (IAI) model, a compact entity–relation graph formalizing how the combination of interactions and text prompts enhances human-GenAI communication. With the model, we distill twelve recurring and composable atomic interaction paradigms from prior tools, verifying our model’s capability to facilitate systematic design characterization and comparison. Four usage scenarios further demonstrate the model’s utility in applying, refining, and innovating these paradigms. These results illustrate the IAI model’s descriptive, discriminative, and generative power for shaping future GenAI systems.
While generative AI enables high-fidelity UI generation from text prompts, users struggle to articulate design intent and evaluate or refine results—creating gulfs of execution and evaluation. To understand the information needed for UI generation, we conducted a thematic analysis of UI prompting guidelines, identifying key design semantics and discovering that they are hierarchical and interdependent. Leveraging these findings, we developed a system that enables users to specify semantics, visualize relationships, and extract how semantics are reflected in generated UIs. By making semantics serve as an intermediate representation between human intent and AI output, our system bridges both gulfs by making requirements explicit and outcomes interpretable. A comparative user study suggests that our approach enhances users' perceived control over intent expression and outcome interpretation, and facilitates more predictable iterative refinement. Our work demonstrates how explicit semantic representation enables systematic and explainable exploration of design possibilities in AI-driven UI design.