People are increasingly turning to large language models (LLMs) for complex information tasks like academic research or planning a move to another city. However, while they often require working in a nonlinear manner --- e.g., to arrange information spatially to organize and make sense of it, current interfaces for interacting with LLMs are generally linear to support conversational interaction. To address this limitation and explore how we can support LLM-powered exploration and sensemaking, we developed Sensecape, an interactive system designed to support complex information tasks with an LLM by enabling users to (1) manage the complexity of information through multilevel abstraction and (2) switch seamlessly between foraging and sensemaking. Our within-subject user study reveals that Sensecape empowers users to explore more topics and structure their knowledge hierarchically, thanks to the externalization of levels of abstraction. We contribute implications for LLM-based workflows and interfaces for information tasks.
https://doi.org/10.1145/3586183.3606756
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents: computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture—observation, planning, and reflection—each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
https://doi.org/10.1145/3586183.3606763
Large language models (LLMs) have recently soared in popularity due to their ease of access and the unprecedented ability to synthesize text responses to diverse user questions. However, LLMs like ChatGPT present significant limitations in supporting complex information tasks due to the insufficient affordances of the text-based medium and linear conversational structure. Through a formative study with ten participants, we found that LLM interfaces often present long-winded responses, making it difficult for people to quickly comprehend and interact flexibly with various pieces of information, particularly during more complex tasks. We present Graphologue, an interactive system that converts text-based responses from LLMs into graphical diagrams to facilitate information-seeking and question-answering tasks. Graphologue employs novel prompting strategies and interface designs to extract entities and relationships from LLM responses and constructs node-link diagrams in real-time. Further, users can interact with the diagrams to flexibly adjust the graphical presentation and to submit context-specific prompts to obtain more information. Utilizing diagrams, Graphologue enables graphical, non-linear dialogues between humans and LLMs, facilitating information exploration, organization, and comprehension.
https://doi.org/10.1145/3586183.3606737
Large Language Models (LLMs) have become the backbone of numerous writing interfaces with the goal of supporting end-users across diverse writing tasks. While LLMs reduce the effort of manual writing, end-users may need to experiment and iterate with various generation configurations (e.g., inputs and model parameters) until results meet their goals. However, these interfaces are not designed for experimentation and iteration, and can restrict how end-users track, compare, and combine configurations. In this work, we present "cells, generators, and lenses", a framework to designing interfaces that support interactive objects that embody configuration components (i.e., input, model, output). Interface designers can apply our framework to produce interfaces that enable end-users to create variations of these objects, combine and recombine them into new configurations, and compare them in parallel to efficiently iterate and experiment with LLMs. To showcase how our framework generalizes to diverse writing tasks, we redesigned three different interfaces—story writing, copywriting, and email composing—and, to demonstrate its effectiveness in supporting end-users, we conducted a comparative study (N=18) where participants used our interactive objects to generate and experiment more. Finally, we investigate the usability of the framework through a workshop with designers (N=3) where we observed that our framework served as both bootstrapping and inspiration in the design process.
https://doi.org/10.1145/3586183.3606833
In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting. Recent advances in large language models (LLMs) have made interactive text generation through a chat interface (e.g., ChatGPT) possible. However, this approach often neglects implicit writing context and user intent, lacks support for user control and autonomy, and provides limited assistance for sensemaking and revising writing plans. To address these challenges, we introduce VISAR, an AI-enabled writing assistant system designed to help writers brainstorm and revise hierarchical goals within their writing context, organize argument structures through synchronized text editing and visual programming, and enhance persuasiveness with argumentation spark recommendations. VISAR allows users to explore, experiment with, and validate their writing plans using automatic draft prototyping. A controlled lab study confirmed the usability and effectiveness of VISAR in facilitating the argumentative writing planning process.
https://doi.org/10.1145/3586183.3606800
While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language and mix prompts to express challenging concepts. Just as we iteratively tune colors by the layered placement of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different parts of the generative process and canvas areas. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and technical and socio-technical challenges to the use of generative models. With PromptPaint we provide insight into future steerable generative tools.
https://doi.org/10.1145/3586183.3606777