Beyond Words: Text and Large Language Models

https://doi.org/10.1145/3586183.3606763

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents: computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture—observation, planning, and reflection—each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Stanford University, Palo Alto, California, United States

Stanford University, Stanford, California, United States

Google, Mountain View, California, United States

Google Research, Seattle, Washington, United States

Stanford University, Stanford, California, United States

https://doi.org/10.1145/3586183.3606737

Large language models (LLMs) have recently soared in popularity due to their ease of access and the unprecedented ability to synthesize text responses to diverse user questions. However, LLMs like ChatGPT present significant limitations in supporting complex information tasks due to the insufficient affordances of the text-based medium and linear conversational structure. Through a formative study with ten participants, we found that LLM interfaces often present long-winded responses, making it difficult for people to quickly comprehend and interact flexibly with various pieces of information, particularly during more complex tasks. We present Graphologue, an interactive system that converts text-based responses from LLMs into graphical diagrams to facilitate information-seeking and question-answering tasks. Graphologue employs novel prompting strategies and interface designs to extract entities and relationships from LLM responses and constructs node-link diagrams in real-time. Further, users can interact with the diagrams to flexibly adjust the graphical presentation and to submit context-specific prompts to obtain more information. Utilizing diagrams, Graphologue enables graphical, non-linear dialogues between humans and LLMs, facilitating information exploration, organization, and comprehension.

University of California San Diego, San Diego, California, United States

University of California, San Diego, San Diego, California, United States

UC San Diego, La Jolla, California, United States

https://doi.org/10.1145/3586183.3606833

Large Language Models (LLMs) have become the backbone of numerous writing interfaces with the goal of supporting end-users across diverse writing tasks. While LLMs reduce the effort of manual writing, end-users may need to experiment and iterate with various generation configurations (e.g., inputs and model parameters) until results meet their goals. However, these interfaces are not designed for experimentation and iteration, and can restrict how end-users track, compare, and combine configurations. In this work, we present "cells, generators, and lenses", a framework to designing interfaces that support interactive objects that embody configuration components (i.e., input, model, output). Interface designers can apply our framework to produce interfaces that enable end-users to create variations of these objects, combine and recombine them into new configurations, and compare them in parallel to efficiently iterate and experiment with LLMs. To showcase how our framework generalizes to diverse writing tasks, we redesigned three different interfaces—story writing, copywriting, and email composing—and, to demonstrate its effectiveness in supporting end-users, we conducted a comparative study (N=18) where participants used our interactive objects to generate and experiment more. Finally, we investigate the usability of the framework through a workshop with designers (N=3) where we observed that our framework served as both bootstrapping and inspiration in the design process.

KAIST, Daejeon, Korea, Republic of

Google, Seattle, Washington, United States

KAIST, Daejeon, Korea, Republic of

https://doi.org/10.1145/3586183.3606800

In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting. Recent advances in large language models (LLMs) have made interactive text generation through a chat interface (e.g., ChatGPT) possible. However, this approach often neglects implicit writing context and user intent, lacks support for user control and autonomy, and provides limited assistance for sensemaking and revising writing plans. To address these challenges, we introduce VISAR, an AI-enabled writing assistant system designed to help writers brainstorm and revise hierarchical goals within their writing context, organize argument structures through synchronized text editing and visual programming, and enhance persuasiveness with argumentation spark recommendations. VISAR allows users to explore, experiment with, and validate their writing plans using automatic draft prototyping. A controlled lab study confirmed the usability and effectiveness of VISAR in facilitating the argumentative writing planning process.

University of Notre Dame, Notre Dame, Indiana, United States

Singapore University of Technology and Design, Singapore, Singapore

University of Notre Dame, Notre Dame, Indiana, United States