Human Behavior with AI Systems

会議の名前
CHI 2026
Behavioral Indicators of Overreliance During Interaction with Conversational Language Models
要旨

LLMs are now embedded in a wide range of everyday scenarios. However, their inherent hallucinations risk hiding misinformation in fluent responses, raising concerns about overreliance on AI. Detecting overreliance is challenging, as it often arises in complex, dynamic contexts and cannot be easily captured by post-hoc task outcomes. In this work, we aim to investigate how users' behavioral patterns correlate with overreliance. We collected interaction logs from 77 participants working with an LLM injected plausible misinformation across three real-world tasks and we assessed overreliance by whether participants detected and corrected these errors. By semantically encoding and clustering segments of user interactions, we identified five behavioral patterns linked to overreliance: users with low overreliance show careful task comprehension and fine-grained navigation; users with high overreliance show frequent copy-paste, skipping initial comprehension, repeated LLM references, coarse locating, and accepting misinformation despite hesitation. We discuss design implications for mitigation.

著者
Chang Liu
Tsinghua University, Beijing, China
Qinyi Zhou
Hong Kong University of Science and Technology, Hong Kong, China
Xinjie Shen
Georgia Institute of Technology , Atlanta , Georgia, United States
Xingyu Bruce. Liu
UCLA, Los Angeles, California, United States
Tongshuang Wu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Xiang 'Anthony' Chen
UCLA, Los Angeles, California, United States
Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts
要旨

Conversational AI is increasingly deployed in emotionally charged and ethically sensitive interactions. Previous research has primarily concentrated on emotional benchmarks or static safety checks, overlooking how alignment unfolds in evolving conversation. We explore the research question: what breakdowns arise when conversational agents confront emotionally and ethically sensitive behaviors, and how do these affect dialogue quality? To stress-test chatbot performance, we develop a persona-conditioned user simulator capable of engaging in multi-turn dialogue with psychological personas and staged emotional pacing. Our analysis reveals that mainstream models exhibit recurrent breakdowns that intensify as emotional trajectories escalate. We identify several common failure patterns, including affective misalignments, ethical guidance failures, and cross-dimensional trade-offs where empathy supersedes or undermines responsibility. We organize these patterns into a taxonomy and discuss the design implications, highlighting the necessity to maintain ethical coherence and affective sensitivity throughout dynamic interactions. The study offers the HCI community a new perspective on the diagnosis and improvement of conversational AI in value-sensitive and emotionally charged contexts.

受賞
Honorable Mention
著者
Jiawen Deng
University of Electronic Science and Technology of China, Chengdu, China
Wentao Zhang
University of Electronic Science and Technology of China, Chengdu, China
Ziyun Jiao
University of Electronic Science and Technology of China, Chengdu, China
Fuji Ren
University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China
Interaction Context Often Increases Sycophancy in LLMs
要旨

We investigate how the presence and type of interaction context shapes sycophancy in LLMs. While real-world interactions allow models to mirror a user's values, preferences, and self-image, prior work often studies sycophancy in zero-shot settings devoid of context. Using two weeks of interaction context from 38 users, we evaluate two forms of sycophancy: (1) agreement sycophancy -- the tendency of models to produce overly affirmative responses, and (2) perspective sycophancy -- the extent to which models reflect a user's viewpoint. Agreement sycophancy tends to increase with the \textit{presence} of user context, though model behavior varies based on the context \textit{type}. User memory profiles are associated with the largest increases in agreement sycophancy (e.g. +45% for Gemini 2.5 Pro), and some models become more sycophantic even with non-user synthetic contexts (e.g. +15% for Llama 4 Scout). Perspective sycophancy increases only when models can accurately infer user viewpoints from interaction context. Overall, context shapes sycophancy in heterogeneous ways, underscoring the need for evaluations grounded in real-world interactions and raising questions for system design around alignment, memory, and personalization.

受賞
Honorable Mention
著者
Shomik Jain
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Charlotte Park
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Matt Viana
The Pennsylvania State University , State College, Pennsylvania, United States
Ashia Wilson
MIT, Cambridge, Massachusetts, United States
Dana Calacci
Penn State University, State College, Pennsylvania, United States
Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills
要旨

Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an urgent need to train people to better evaluate and detect misinformation. While interactions with AI have been shown to durably reduce people's beliefs in false information, it is unclear whether these interactions also teach people the skills to discern false information themselves. We conducted a month-long study where 67 participants classified news headline-image pairs as real or fake, discussed their assessments with an AI system, followed by an unassisted evaluation of unseen news items to measure accuracy before, during, and after AI assistance. While AI assistance produced immediate improvements during AI-assisted sessions (+21\% average), participants' unassisted performance on new items declined significantly by 15.3\% in week 4 compared to week 0. These results indicate that while AI may help immediately, it may ultimately degrade long-term misinformation detection abilities.

受賞
Honorable Mention
著者
Anku Rani
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Valdemar Danry
MIT, CAMBRIDGE, Massachusetts, United States
Paul Pu Liang
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Andrew Lippman
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Pattie Maes
MIT , Cambridge, Massachusetts, United States
The Bots of Persuasion: Examining How Conversational Agents' Linguistic Expressions of Personality Affect User Perceptions and Decisions
要旨

Large Language Model-powered conversational agents (CAs) are increasingly capable of projecting sophisticated personalities through language, but how these projections affect users is unclear. We thus examine how CA personalities expressed linguistically affect user decisions and perceptions in the context of charitable giving. In a crowdsourced study, 360 participants interacted with one of eight CAs, each projecting a personality composed of three linguistic aspects: attitude (optimistic/pessimistic), authority (authoritative/submissive), and reasoning (emotional/rational). While the CA's composite personality did not affect participants' decisions, it did affect their perceptions and emotional responses. Particularly, participants interacting with pessimistic CAs felt lower emotional state and lower affinity towards the cause, perceived the CA as less trustworthy and less competent, and yet tended to donate more toward the charity. Perceptions of trust, competence, and situational empathy significantly predicted donation decisions. Our findings emphasize the risks CAs pose as instruments of manipulation, subtly influencing user perceptions and decisions.

受賞
Honorable Mention
著者
Hüseyin Uğur Genç
TU Delft, Delft, Netherlands
Heng Gu
TU Delft, Delft, Netherlands
Chadha Degachi
TU Delft, Delft, Netherlands
Evangelos Niforatos
TU Delft, Delft, Netherlands
Senthil Chandrasegaran
TU Delft, Delft, Netherlands
Himanshu Verma
TU Delft, Delft, Netherlands
RECALLbot: Designing Agentic Memory and Reciprocal Disclosure for Human–Chatbot Relationships
要旨

Social chatbots are increasingly studied for their benefits in providing companionship and emotional support. These benefits rely on forming human-chatbot relationships that require credible social identity and reciprocal interaction. Memory plays a dual role: it strengthens social identity by enabling the chatbot to remember, and supports reciprocal interaction when memories are disclosed mutually. We present RECALLbot, an LLM-driven social chatbot that constructs agentic memories, including life-like Me Memory and co-constructed We Memory, and adaptively applies reciprocal disclosure strategies with user controls. In a two-week between-subjects study (N = 40), RECALLbot was compared with a baseline system lacking agentic memories and reciprocal disclosure strategies. Results show that RECALLbot enhanced perceptions of the chatbot’s social identity, elicited more frequent and deeper self-disclosures, and fostered greater trust.

著者
Zhaojun Jiang
Zhejiang University, Hangzhou, Zhejiang, China
Chunyuan Zheng
Zhejiang University, Hangzhou, China
Hongyi Chen
Shenzhen College of International Education, Shenzhen, China
Liuqing Chen
Zhejiang University, Hangzhou, China
Promptimizer: User-Led Prompt Optimization for Personal Content Classification
要旨

While LLMs now enable social media users to create content classifiers easily through natural language, automatic prompt optimization techniques are often necessary to create performant classifiers. However, such techniques can fail to consider how users want to evolve their classifiers over the course of usage, including desiring to steer them in different ways during initialization and refinement. We introduce a user-centered prompt optimization technique, Promptimizer, that maintains high performance and ease-of-use but additionally (1) allows for user input into the optimization process and (2) produces final prompts that are interpretable. A lab experiment (n=16) found that users significantly preferred Promptimizer’s human-in-the-loop optimization over a fully automatic approach. We also implement Promptimizer into Puffin, a tool to support YouTube content creators in creating and maintaining personal classifiers to manage their comments. Over a 3-week deployment with 10 creators, participants successfully created diverse filters to better understand their audiences and protect their communities.

著者
Leijie Wang
University of Washington, Seattle, Washington, United States
Kathryn Yurechko
Washington and Lee University, Lexington, Virginia, United States
Amy X.. Zhang
University of Washington, Seattle, Washington, United States