63. DeIving into LLMs

前のセッションの直後

30分

No Evidence for LLMs Being Useful in Problem Reframing

ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines

Rescriber: Smaller-LLM-Powered User-Led Data Minimization for LLM-Based Chatbots

Ontologies in Design: How Imagining a Tree Reveals Possibilities and Assumptions in Large Language Models

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning

Plurals: A System for Guiding LLMs via Simulated Social Ensembles

LLM Whisperer: An Inconspicuous Attack to Bias LLM Responses

この勉強会は終了しました。ご参加ありがとうございました。

62. Critics on AI

64. Ethics and Empowerment

リンク: https://dl.acm.org/doi/10.1145/3706598.3713273

Problem reframing is a designerly activity wherein alternative perspectives are created to recast what a stated design problem is about. Generating alternative problem frames is challenging because it requires devising novel and useful perspectives that fit the given problem context. Large language models (LLMs) could assist this activity via their generative capability. However, it is not clear whether they can help designers produce high-quality frames. Therefore, we asked if there are benefits to working with LLMs. To this end, we compared three ways of using LLMs (N=280): 1) free-form, 2) direct generation, and 3) a structured approach informed by a theory of reframing. We found that using LLMs does not help improve the quality of problem frames. In fact, it increases the competence gap between experienced and inexperienced designers. Also, inexperienced ones perceived lower agency when working with LLMs. We conclude that there is no benefit to using LLMs in problem reframing and discuss possible factors for this lack of effect.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3714085

As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-defined tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page problem." ChainBuddy, an AI workflow generation assistant built into the ChainForge platform, aims to tackle this issue. From a single prompt or chat, ChainBuddy generates a starter evaluative LLM pipeline in ChainForge aligned to the user's requirements. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior and make the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload, felt more confident, and produced higher quality pipelines evaluating LLM behavior. However, we also uncover a mismatch between subjective and objective ratings of performance: participants rated their successfulness similarly across conditions, while independent experts rated participant workflows significantly higher with AI assistance. Drawing connections to the Dunning–Kruger effect, we discuss implications for the future design of workflow generation assistants regarding the risk of over-reliance.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713701

The proliferation of LLM-based conversational agents has resulted in excessive disclosure of identifiable or sensitive information. However, existing technologies fail to offer perceptible control or account for users’ personal preferences about privacy-utility tradeoffs due to the lack of user involvement. To bridge this gap, we designed, built, and evaluated Rescriber, a browser extension that supports user-led data minimization in LLM-based conversational agents by helping users detect and sanitize personal information in their prompts. Our studies (N=12) showed that Rescriber helped users reduce unnecessary disclosure and addressed their privacy concerns. Users’ subjective perceptions of the system powered by Llama3-8B were on par with that by GPT-4o. The comprehensiveness and consistency of the detection and sanitization emerge as essential factors that affect users’ trust and perceived protection. Our findings confirm the viability of smaller-LLM-powered, user-facing, on-device privacy controls, presenting a promising approach to address the privacy and trust challenges of AI.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713633

Amid the recent uptake of Generative AI, sociotechnical scholars and critics have traced a multitude of resulting harms, with analyses largely focused on values and axiology (e.g., bias). While value-based analyses are crucial, we argue that ontologies—concerning what we allow ourselves to think or talk about—is a vital but under-recognized dimension in analyzing these systems. Proposing a need for a practice-based engagement with ontologies, we offer four orientations for considering ontologies in design: pluralism, groundedness, liveliness, and enactment. We share examples of potentialities that are opened up through these orientations across the entire LLM development pipeline by conducting two ontological analyses: examining the responses of four LLM-based chatbots in a prompting exercise, and analyzing the architecture of an LLM-based agent simulation. We conclude by sharing opportunities and limitations of working with ontologies in the design and development of sociotechnical systems.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3714113

Automated planning is traditionally the domain of experts, utilized in fields like manufacturing and healthcare with the aid of expert planning tools. Recent advancements in LLMs have made planning more accessible to everyday users due to their potential to assist users with complex planning tasks. However, LLMs face several application challenges within end-user planning, including consistency, accuracy, and user trust issues. This paper introduces VeriPlan, a system that applies formal verification techniques, specifically model checking, to enhance the reliability and flexibility of LLMs for end-user planning. In addition to the LLM planner, VeriPlan includes three additional core features---a rule translator, flexibility sliders, and a model checker---that engage users in the verification process. Through a user study ($n=12$), we evaluate VeriPlan, demonstrating improvements in the perceived quality, usability, and user satisfaction of LLMs. Our work shows the effective integration of formal verification and user-control features with LLMs for end-user planning tasks.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713675

Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a "view from nowhere'' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by deliberative democracy, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3714025

Writing effective prompts for large language models (LLM) can be

unintuitive and burdensome. In response, services that optimize or

suggest prompts have emerged. While such services can reduce user

effort, they also introduce a risk: the prompt provider can subtly

manipulate prompts to produce heavily biased LLM responses. In

this work, we show that subtle synonym replacements in prompts

can increase the likelihood (by a difference up to 78%) that LLMs

mention a target concept (e.g., a brand, political party, nation). We

substantiate our observations through a user study, showing that

our adversarially perturbed prompts 1) are indistinguishable from

unaltered prompts by humans, 2) push LLMs to recommend target

concepts more often, and 3) make users more likely to notice target

concepts, all without arousing suspicion. The practicality of this

attack has the potential to undermine user autonomy. Among other

measures, we recommend implementing warnings against using

prompts from untrusted parties.

読み込み中…

62. Critics on AI

64. Ethics and Empowerment

目次

終了した勉強会

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ