Large Language Models

会議の名前
CHI 2023
Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study
要旨

Collecting data is one of the bottlenecks of Human-Computer Interaction (HCI) research. Motivated by this, we explore the potential of large language models (LLMs) in generating synthetic user research data. We use OpenAI’s GPT-3 model to generate open-ended questionnaire responses about experiencing video games as art, a topic not tractable with traditional computational user models. We test whether synthetic responses can be distinguished from real responses, analyze errors of synthetic data, and investigate content similarities between synthetic and real data. We conclude that GPT-3 can, in this context, yield believable accounts of HCI experiences. Given the low cost and high speed of LLM data generation, synthetic data should be useful in ideating and piloting new experiments, although any findings must obviously always be validated with real data. The results also raise concerns: if employed by malicious users of crowdsourcing services, LLMs may make crowdsourcing of self-report data fundamentally unreliable.

受賞
Best Paper
著者
Perttu Hämäläinen
Aalto University, Espoo, Finland
Mikke Tavast
Aalto University, Espoo, Finland
Anton Kunnari
University of Helsinki, Helsinki, Finland
論文URL

https://doi.org/10.1145/3544548.3580688

動画
Enabling Conversational Interaction with Mobile UI using Large Language Models
要旨

Conversational agents show the promise to allow users to interact with mobile devices using language. However, to perform diverse UI tasks with natural language, developers typically need to create separate datasets and models for each specific task, which is expensive and effort-consuming. Recently, pre-trained large language models (LLMs) have been shown capable of generalizing to various downstream tasks when prompted with a handful of examples from the target task. This paper investigates the feasibility of enabling versatile conversational interactions with mobile UIs using a single LLM. We designed prompting techniques to adapt an LLM to mobile UIs. We experimented with four important modeling tasks that address various scenarios in conversational interaction. Our method achieved competitive performance on these challenging tasks without requiring dedicated datasets and training, offering a lightweight and generalizable approach to enable language-based mobile interaction.

著者
Bryan Wang
University of Toronto, Toronto, Ontario, Canada
Gang Li
Google Research, Mountain View, California, United States
Yang Li
Google Research, Mountain View, California, United States
論文URL

https://doi.org/10.1145/3544548.3580895

動画
PopBlends: Strategies for Conceptual Blending with Large Language Models
要旨

Pop culture is an important aspect of communication. On social media people often post pop culture reference images that connect an event, product or other entity to a pop culture domain. Creating these images is a creative challenge that requires finding a conceptual connection between the users' topic and a pop culture domain. In cognitive theory, this task is called conceptual blending. We present a system called PopBlends that automatically suggests conceptual blends. The system explores three approaches that involve both traditional knowledge extraction methods and large language models. Our annotation study shows that all three methods provide connections with similar accuracy, but with very different characteristics. Our user study shows that people found twice as many blend suggestions as they did without the system, and with half the mental demand. We discuss the advantages of combining large language models with knowledge bases for supporting divergent and convergent thinking.

著者
Sitong Wang
Columbia University, New York City, New York, United States
Savvas Petridis
Columbia University, New York, New York, United States
Taeahn Kwon
Columbia University, New York, New York, United States
Xiaojuan Ma
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Lydia B. Chilton
Columbia University, New York, New York, United States
論文URL

https://doi.org/10.1145/3544548.3580948

動画
On the Design of AI-powered Code Assistants for Notebooks
要旨

AI-powered code assistants, such as Copilot, are quickly becoming a ubiquitous component of contemporary coding contexts. Among these environments, computational notebooks, such as Jupyter, are of particular interest as they provide rich interface affordances that interleave code and output in a manner that allows for both exploratory and presentational work. Despite their popularity, little is known about the appropriate design of code assistants in notebooks. We investigate the potential of code assistants in computational notebooks by creating a design space (reified from a survey of extant tools) and through an interview-design study (with 15 practicing data scientists). Through this work, we identify challenges and opportunities for future systems in this space, such as the value of disambiguation for tasks like data visualization, the potential of tightly scoped domain-specific tools (like linters), and the importance of polite assistants.

著者
Andrew M. McNutt
University of Chicago, Chicago, Illinois, United States
Chenglong Wang
Microsoft Research, Redmond, Washington, United States
Robert A. DeLine
Microsoft Corp, Redmond, Washington, United States
Steven M.. Drucker
Microsoft Research, Redmond, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3580940

動画
Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts
要旨

Pre-trained large language models ("LLMs") like GPT-3 can engage in fluent, multi-turn instruction-taking out-of-the-box, making them attractive materials for designing natural language interactions. Using natural language to steer LLM outputs ("prompting") has emerged as an important design technique potentially accessible to non-AI-experts. Crafting effective prompts can be challenging, however, and prompt-based interactions are brittle. Here, we explore whether non-AI-experts can successfully engage in "end-user prompt engineering" using a design probe—a prototype LLM-based chatbot design tool supporting development and systematic evaluation of prompting strategies. Ultimately, our probe participants explored prompt designs opportunistically, not systematically, and struggled in ways echoing end-user programming systems and interactive machine learning systems. Expectations stemming from human-to-human instructional experiences, and a tendency to overgeneralize, were barriers to effective prompt design. These findings have implications for non-AI-expert-facing LLM-based tool design and for improving LLM-and-prompt literacy among programmers and the public, and present opportunities for further research.

著者
J.D. Zamfirescu-Pereira
Berkeley, Berkeley, California, United States
Richmond Y.. Wong
Georgia Institute of Technology, Atlanta, Georgia, United States
Bjoern Hartmann
UC Berkeley, Berkeley, California, United States
Qian Yang
Cornell University, Ithaca, New York, United States
論文URL

https://doi.org/10.1145/3544548.3581388

動画
Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions
要旨

Large language models have abilities in creating high-volume human-like texts and can be used to generate persuasive misinformation. However, the risks remain under-explored. To address the gap, this work first examined characteristics of AI-generated misinformation (AI-misinfo) compared with human creations, and then evaluated the applicability of existing solutions. We compiled human-created COVID-19 misinformation and abstracted it into narrative prompts for a language model to output AI-misinfo. We found significant linguistic differences within human-AI pairs, and patterns of AI-misinfo in enhancing details, communicating uncertainties, drawing conclusions, and simulating personal tones. While existing models remained capable of classifying AI-misinfo, a significant performance drop compared to human-misinfo was observed. Results suggested that existing information assessment guidelines had questionable applicability, as AI-misinfo tended to meet criteria in evidence credibility, source transparency, and limitation acknowledgment. We discuss implications for practitioners, researchers, and journalists, as AI can create new challenges to the societal problem of misinformation.

受賞
Honorable Mention
著者
Jiawei Zhou
Georgia Institute of Technology, Atlanta, Georgia, United States
Yixuan Zhang
Georgia Institute of Technology, Atlanta, Georgia, United States
Qianni Luo
Ohio University, Athens, Ohio, United States
Andrea G. Parker
Georgia Tech, Atlanta, Georgia, United States
Munmun De Choudhury
Georgia Institute of Technology, Atlanta, Georgia, United States
論文URL

https://doi.org/10.1145/3544548.3581318

動画