Natural Language

会議の名前
CHI 2022
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
要旨

Text-to-image generative models are a new and powerful way to generate visual artwork. However, the open-ended nature of text as interaction is double-edged; while users can input anything and have access to an infinite range of generations, they also must engage in brute-force trial and error with the text prompt when the result quality is poor. We conduct a study exploring what prompt keywords and model hyperparameters can help produce coherent outputs. In particular, we study prompts structured to include subject and style keywords and investigate success and failure modes of these prompts. Our evaluation of 5493 generations over the course of five experiments spans 51 abstract and concrete subjects as well as 51 abstract and figurative styles. From this evaluation, we present design guidelines that can help people produce better outcomes from text-to-image generative models.

著者
Vivian Liu
Columbia University, New York, New York, United States
Lydia B. Chilton
Columbia University, New York, New York, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3501825

動画
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
要旨

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by “unit-testing” sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications

著者
Tongshuang Wu
University of Washington, Seattle, Washington, United States
Michael Terry
Google, Cambridge, Massachusetts, United States
Carrie J. Cai
Google, Mountain View, California, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3517582

動画
Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models
要旨

In this paper, we present a natural language code synthesis tool, GenLine, backed by 1) a large generative language model and 2) a set of task-specific prompts that create or change code. To understand the user experience of natural language code synthesis with these new types of models, we conducted a user study in which participants applied GenLine to two programming tasks. Our results indicate that while natural language code synthesis can sometimes provide a magical experience, participants still faced challenges. In particular, participants felt that they needed to learn the model’s ``syntax,'' despite their input being natural language. Participants also struggled to form an accurate mental model of the types of requests the model can reliably translate and developed a set of strategies to debug model input. From these findings, we discuss design implications for future natural language code synthesis tools built using large generative language models.

著者
Ellen Jiang
Google, Cambridge, Massachusetts, United States
Edwin Toh
Google, Mountain View, California, United States
Alejandra Molina
Google, New York, New York, United States
Kristen Olson
Google, Seattle, Washington, United States
Claire Kayacik
Google, Mountain View, California, United States
Aaron Donsbach
Google, Seattle, Washington, United States
Carrie J. Cai
Google, Mountain View, California, United States
Michael Terry
Google, Cambridge, Massachusetts, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3501870

動画
Towards Complete Icon Labeling in Mobile Applications
要旨

Accurately recognizing icon types in mobile applications is integral to many tasks, including accessibility improvement, UI design search, and conversational agents. Existing research focuses on recognizing the most frequent icon types, but these technologies fail when encountering an unrecognized low-frequency icon. In this paper, we work towards complete coverage of icons in the wild. After annotating a large-scale icon dataset (327,879 icons) from iPhone apps, we found a highly uneven distribution: 98 common icon types covered 92.8% of icons, while 7.2% of icons were covered by more than 331 long-tail icon types. In order to label icons with widely varying occurrences in apps, our system uses an image classification model to recognize common icon types with an average of 3,000 examples each (96.3% accuracy) and applies a few-shot learning model to classify long-tail icon types with an average of 67 examples each (78.6% accuracy). Our system also detects contextual information that helps characterize icon semantics, including nearby text (95.3% accuracy) and modifier symbols added to the icon (87.4% accuracy). In a validation study with workers (n=23), we verified the usefulness of our generated icon labels. The icon types supported by our work cover 99.5% of collected icons, improving on the previously highest 78% coverage in icon classification work.

著者
Jieshan Chen
Australian National University, Canberra, Australia
Amanda Swearngin
Apple, Seattle, Washington, United States
Jason Wu
Apple, Pittsburgh, Pennsylvania, United States
Titus Barik
Apple, Seattle, Washington, United States
Jeffrey Nichols
Apple Inc, San Diego, California, United States
Xiaoyi Zhang
Apple Inc, Seattle, Washington, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502073

動画
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
要旨

Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design. However, their highly context-dependent capabilities are difficult to grasp and are often subjectively interpreted. In this paper, we argue that by curating and analyzing large interaction datasets, the HCI community can foster more incisive examinations of LMs' generative capabilities. Exemplifying this approach, we present CoAuthor, a dataset designed for revealing GPT-3's capabilities in assisting creative and argumentative writing. CoAuthor captures rich interactions between 63 writers and four instances of GPT-3 across 1445 writing sessions. We demonstrate that CoAuthor can address questions about GPT-3's language, ideation, and collaboration capabilities, and reveal its contribution as a writing "collaborator" under various definitions of good collaboration. Finally, we discuss how this work may facilitate a more principled discussion around LMs' promises and pitfalls in relation to interaction design. The dataset and an interface for replaying the writing sessions are publicly available at https://coauthor.stanford.edu.

受賞
Honorable Mention
著者
Mina Lee
Stanford University, Stanford, California, United States
Percy Liang
Stanford University, Stanford, California, United States
Qian Yang
Cornell University, Ithaca, New York, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502030

動画