User Studies on Large Language Models

会議の名前
CHI 2024
The Effects of Perceived AI Use On Content Perceptions
要旨

There is a potential future where the content created by a human and an AI are indistinguishable. In this future, if you can't tell the difference, does it matter? We conducted a 3 (Assigned creator: human, human with AI assistance, AI) by 4 (Context: news, travel, health, and jokes) mixed-design experiment where participants evaluated human-written content that was presented as created by a human, a human with AI assistance, or an AI. We found that participants felt more negatively about the content creator and were less satisfied when they thought AI was used, but assigned creator had no effect on content judgments. We also identified five interpretations for how participants thought AI use affected the content creation process. Our work suggests that informing users about AI use may not have the intended effect of helping consumers make content judgments and may instead damage the relationship between creators and followers.

著者
Irene Rae
Google, Madison, Wisconsin, United States
論文URL

doi.org/10.1145/3613904.3642076

動画
DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models
要旨

We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.

受賞
Honorable Mention
著者
Damien Masson
University of Waterloo, Waterloo, Ontario, Canada
Sylvain Malacria
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Géry Casiez
Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Daniel Vogel
University of Waterloo, Waterloo, Ontario, Canada
論文URL

doi.org/10.1145/3613904.3642462

動画
From Text to Self: Users’ Perception of AIMC Tools on Interpersonal Communication and Self
要旨

In the rapidly evolving landscape of AI-mediated communication (AIMC), tools powered by Large Language Models (LLMs) are becoming integral to interpersonal communication. Employing a mixed-methods approach, we conducted a one-week diary and interview study to explore users’ perceptions of these tools’ ability to: 1) support interpersonal communication in the short-term, and 2) lead to potential long-term effects. Our findings indicate that participants view AIMC support favorably, citing benefits such as increased communication confidence, finding precise language to express their thoughts, and navigating linguistic and cultural barriers. However, our findings also show current limitations of AIMC tools, including verbosity, unnatural responses, and excessive emotional intensity. These shortcomings are further exacerbated by user concerns about inauthenticity and potential overreliance on the technology. We identify four key communication spaces delineated by communication stakes (high or low) and relationship dynamics (formal or informal) that differentially predict users’ attitudes toward AIMC tools. Specifically, participants report that these tools are more suitable for communicating in formal relationships than informal ones and more beneficial in high-stakes than low-stakes communication.

受賞
Best Paper
著者
Yue Fu
University of Washington, Seattle, Washington, United States
Sami Foell
University of Washington, Seattle, Washington, United States
Xuhai "Orson" Xu
University of Washington, Seattle, Washington, United States
Alexis Hiniker
University of Washington, Seattle, Washington, United States
論文URL

doi.org/10.1145/3613904.3641955

動画
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
要旨

Prompt-based interfaces for Large Language Models (LLMs) have made prototyping and building AI-powered applications easier than ever before. However, identifying potential harms that may arise from AI applications remains a challenge, particularly during prompt-based prototyping. To address this, we present Farsight, a novel in situ interactive tool that helps people identify potential harms from the AI applications they are prototyping. Based on a user's prompt, Farsight highlights news articles about relevant AI incidents and allows users to explore and edit LLM-generated use cases, stakeholders, and harms. We report design insights from a co-design study with 10 AI prototypers and findings from a user study with 42 AI prototypers. After using Farsight, AI prototypers in our user study are better able to independently identify potential harms associated with a prompt and find our tool more useful and usable than existing resources. Their qualitative feedback also highlights that Farsight encourages them to focus on end-users and think beyond immediate harms. We discuss these findings and reflect on their implications for designing AI prototyping experiences that meaningfully engage with AI harms. Farsight is publicly accessible at: https://pair-code.github.io/farsight.

受賞
Honorable Mention
著者
Zijie J.. Wang
Georgia Tech, Atlanta, Georgia, United States
Chinmay Kulkarni
Emory University, Atlanta, Georgia, United States
Lauren Wilcox
Georgia Institute of Technology, Atlanta, Georgia, United States
Michael Terry
Google, Cambridge, Massachusetts, United States
Michael Madaio
Google Research, New York, New York, United States
論文URL

doi.org/10.1145/3613904.3642335

動画
“As an AI language model, I cannot”: Investigating LLM Denials of User Requests
要旨

Users ask large language models (LLMs) to help with their homework, for lifestyle advice, or for support in making challenging decisions. Yet LLMs are often unable to fulfil these requests, either as a result of their technical inabilities or policies restricting their responses. To investigate the effect of LLMs denying user requests, we evaluate participants' perceptions of different denial styles. We compare specific denial styles (baseline, factual, diverting, and opinionated) across two studies, respectively focusing on LLM's technical limitations and their social policy restrictions. Our results indicate significant differences in users' perceptions of the denials between the denial styles. The baseline denial, which provided participants with brief denials without any motivation, was rated significantly higher on frustration and significantly lower on usefulness, appropriateness, and relevance. In contrast, we found that participants generally appreciated the diverting denial style. We provide design recommendations for LLM denials that better meet peoples' denial expectations.

受賞
Honorable Mention
著者
Joel Wester
Aalborg University, Aalborg, Denmark
Tim Schrills
Institut for Multimedia and Interactive Systems, University of Luebeck, Luebeck, Germany
Henning Pohl
Aalborg University, Aalborg, Denmark
Niels van Berkel
Aalborg University, Aalborg, Denmark
論文URL

doi.org/10.1145/3613904.3642135

動画