DeIving into LLMs

会議の名前
CHI 2025
No Evidence for LLMs Being Useful in Problem Reframing
要旨

Problem reframing is a designerly activity wherein alternative perspectives are created to recast what a stated design problem is about. Generating alternative problem frames is challenging because it requires devising novel and useful perspectives that fit the given problem context. Large language models (LLMs) could assist this activity via their generative capability. However, it is not clear whether they can help designers produce high-quality frames. Therefore, we asked if there are benefits to working with LLMs. To this end, we compared three ways of using LLMs (N=280): 1) free-form, 2) direct generation, and 3) a structured approach informed by a theory of reframing. We found that using LLMs does not help improve the quality of problem frames. In fact, it increases the competence gap between experienced and inexperienced designers. Also, inexperienced ones perceived lower agency when working with LLMs. We conclude that there is no benefit to using LLMs in problem reframing and discuss possible factors for this lack of effect.

著者
Joongi Shin
Aalto University, Espoo, Finland
Anna Polyanskaya
Universidad del País Vasco, Donostia-San Sebastian, Spain
Andrés Lucero
Aalto University, Espoo, Finland
Antti Oulasvirta
Aalto University, Helsinki, Finland
DOI

10.1145/3706598.3713273

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713273

動画
ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines
要旨

As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-defined tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page problem." ChainBuddy, an AI workflow generation assistant built into the ChainForge platform, aims to tackle this issue. From a single prompt or chat, ChainBuddy generates a starter evaluative LLM pipeline in ChainForge aligned to the user's requirements. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior and make the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload, felt more confident, and produced higher quality pipelines evaluating LLM behavior. However, we also uncover a mismatch between subjective and objective ratings of performance: participants rated their successfulness similarly across conditions, while independent experts rated participant workflows significantly higher with AI assistance. Drawing connections to the Dunning–Kruger effect, we discuss implications for the future design of workflow generation assistants regarding the risk of over-reliance.

著者
Jingyue Zhang
Université de Montréal, Montréal, Quebec, Canada
Ian Arawjo
Université de Montréal, Montréal, Quebec, Canada
DOI

10.1145/3706598.3714085

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714085

動画
Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots
要旨

The proliferation of LLM-based conversational agents has resulted in excessive disclosure of identifiable or sensitive information. However, existing technologies fail to offer perceptible control or account for users’ personal preferences about privacy-utility tradeoffs due to the lack of user involvement. To bridge this gap, we designed, built, and evaluated Rescriber, a browser extension that supports user-led data minimization in LLM-based conversational agents by helping users detect and sanitize personal information in their prompts. Our studies (N=12) showed that Rescriber helped users reduce unnecessary disclosure and addressed their privacy concerns. Users’ subjective perceptions of the system powered by Llama3-8B were on par with that by GPT-4o. The comprehensiveness and consistency of the detection and sanitization emerge as essential factors that affect users’ trust and perceived protection. Our findings confirm the viability of smaller-LLM-powered, user-facing, on-device privacy controls, presenting a promising approach to address the privacy and trust challenges of AI.

著者
Jijie Zhou
Northeastern University, Boston, Massachusetts, United States
Eryue Xu
Northeastern University, Boston, Massachusetts, United States
Yaoyao Wu
Northeastern University, Boston, Massachusetts, United States
Tianshi Li
Northeastern University, Boston, Massachusetts, United States
DOI

10.1145/3706598.3713701

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713701

動画
Ontologies in Design: How Imagining a Tree Reveals Possibilities and Assumptions in Large Language Models
要旨

Amid the recent uptake of Generative AI, sociotechnical scholars and critics have traced a multitude of resulting harms, with analyses largely focused on values and axiology (e.g., bias). While value-based analyses are crucial, we argue that ontologies—concerning what we allow ourselves to think or talk about—is a vital but under-recognized dimension in analyzing these systems. Proposing a need for a practice-based engagement with ontologies, we offer four orientations for considering ontologies in design: pluralism, groundedness, liveliness, and enactment. We share examples of potentialities that are opened up through these orientations across the entire LLM development pipeline by conducting two ontological analyses: examining the responses of four LLM-based chatbots in a prompting exercise, and analyzing the architecture of an LLM-based agent simulation. We conclude by sharing opportunities and limitations of working with ontologies in the design and development of sociotechnical systems.

著者
Nava Haghighi
Stanford University, Stanford, California, United States
Sunny Yu
Stanford University , Stanford , California, United States
James A.. Landay
Stanford University, Stanford, California, United States
Daniela Rosner
University of Washington, Seattle, Washington, United States
DOI

10.1145/3706598.3713633

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713633

動画
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
要旨

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features---a rule translator, flexibility sliders, and a model checker---that engage users in the verification process. Through a user study ($n=12$), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

著者
Christine P.. Lee
University of Wisconsin-Madison, Madison, Wisconsin, United States
David Porfirio
U.S. Naval Research Laboratory, Washington, District of Columbia, United States
Xinyu Jessica. Wang
University of Wisconsin - Madison, Madison, Wisconsin, United States
Kevin Chenkai. Zhao
University of Wisconsin-Madison, Madison, Wisconsin, United States
Bilge Mutlu
University of Wisconsin-Madison, Madison, Wisconsin, United States
DOI

10.1145/3706598.3714113

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714113

動画
Plurals: A System for Guiding LLMs via Simulated Social Ensembles
要旨

Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a "view from nowhere'' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by deliberative democracy, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI.

受賞
Honorable Mention
著者
Joshua Ashkinaze
University of Michigan, Ann Arbor, Michigan, United States
Emily Fry
Oakland Community College, Auburn Hills, Michigan, United States
Narendra Edara
University of Michigan, Ann Arbor, Michigan, United States
Eric Gilbert
University of Michigan, Ann Arbor, Michigan, United States
Ceren Budak
University of Michigan, Ann Arbor, Michigan, United States
DOI

10.1145/3706598.3713675

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713675

動画
LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses
要旨

Writing effective prompts for large language models (LLM) can be unintuitive and burdensome. In response, services that optimize or suggest prompts have emerged. While such services can reduce user effort, they also introduce a risk: the prompt provider can subtly manipulate prompts to produce heavily biased LLM responses. In this work, we show that subtle synonym replacements in prompts can increase the likelihood (by a difference up to 78%) that LLMs mention a target concept (e.g., a brand, political party, nation). We substantiate our observations through a user study, showing that our adversarially perturbed prompts 1) are indistinguishable from unaltered prompts by humans, 2) push LLMs to recommend target concepts more often, and 3) make users more likely to notice target concepts, all without arousing suspicion. The practicality of this attack has the potential to undermine user autonomy. Among other measures, we recommend implementing warnings against using prompts from untrusted parties.

著者
Weiran Lin
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Anna Gerchanovsky
Duke University, Durham, North Carolina, United States
Omer Akgul
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Lujo Bauer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Matt Fredrikson
Carnegie Mellon, Pittsburgh, Pennsylvania, United States
Zifan Wang
Scale AI, San Francisco, California, United States
DOI

10.1145/3706598.3714025

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714025

動画