Users' interactions with recommender systems often involve more than simple acceptance or rejection. We highlight two overlooked states: hesitation, when people deliberate without certainty, and tolerance, when this hesitation escalates into unwanted engagement before ending in disinterest. Across two large-scale surveys (N=6,644 and N=3,864), hesitation was nearly universal, and tolerance emerged as a recurring source of wasted time, frustration, and diminished trust. Analyses of e-commerce and short-video platforms confirm that tolerance behaviors, such as clicking without purchase or shallow viewing, correlate with decreased activity. Finally, an online field study at scale shows that even lightweight strategies treating tolerance as distinct from interest can improve retention while reducing wasted effort. By surfacing hesitation and tolerance as consequential states, this work reframes how recommender systems should interpret feedback, moving beyond clicks and dwell time toward designs that respect user value, reduce hidden costs, and sustain engagement.
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors' views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and that proprietary model usage can be justified despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.
AI has advanced radiology, yet variability across hospitals and devices undermines reliability and trust. We present a federated learning framework that combines frequency-domain harmonization and instruction-conditioned personalization to deliver consistent and interpretable diagnostic outcomes. Using FFT-based reconstructions informed by radiomics descriptors, the system reduces equipment dependency, while CLIP-based text conditioning enables clinicians to guide reconstructions to local practices and patient needs. We evaluated the framework across four hospitals with fifteen radiologists and fifty patients, spanning polyp detection, rotator cuff tear diagnosis, pneumothorax classification, and breast cancer classification/segmentation. Results show significant gains in accuracy, calibration, and robustness under cross-site transfer, without introducing prohibitive latency. Radiologists reported improved interpretability and preserved professional agency, while patients expressed greater trust, reduced anxiety, and stronger acceptance of AI involvement. This work advances a human-centered design for medical AI, aligning federated learning with transparency, equity, and trustworthy deployment.
Web agents aim to execute complex online tasks from high-level instructions, yet fully autonomous execution remains challenging in practice. We present an empirical study of user interventions in human–web agent collaboration, moving beyond outcome-based metrics to examine how interventions unfold during execution. We conducted a controlled in-lab study with 30 participants whose interactions reflected early-stage web agent adoption across 12 structured tasks in shopping, travel, and information-seeking domains using live websites. Analyzing interaction logs, user inputs, and screen recordings, we identify diverse behaviors and propose a taxonomy capturing both the reasons for intervention and the forms they take. We distinguish explicit interventions, where users halt or override actions, from implicit interventions, where users guide or prepare the environment without stopping execution. Our findings reveal how task structure and execution breakdowns shape intervention behaviors to provide process-level evidence for designing web agents that better support users as active collaborators.
Expectation cues such as source labels, expertise signals, or identity-based indicators can bias how humans interpret and evaluate information. In high-stakes domains like healthcare, education, and law, such biases threaten the objectivity of decision-making. As LLMs increasingly provide decision support in these contexts, this study aims to examine whether LLMs exhibit expectation-driven bias akin to that of humans. Across two experiments (N = 1260), we manipulated expectations via priming statements and measured shifts in judgment scores. In both humans and LLMs, higher expectations led to more favorable evaluations for suggestions of equivalent quality, and greater mismatches between expectations and actual performance produced stronger judgment distortions. Notably, humans tended to adjust their evaluations unconsciously, whereas LLMs revised their outputs in a consistent and traceable manner. These findings reveal both shared sensitivities and distinct adjustment patterns, offering design insights for building expectation-aware AI systems that promote fair and transparent human–AI interaction.
The growing presence of AI-manipulated videos presents a significant challenge to the integrity of online information. This paper presents findings from an empirical study with 490 participants in the United States to provide a holistic view of public engagement with this threat. We structure our analysis around three key areas: (1) how demographics and media habits influence general perceptions of prevalence; (2) the factors shaping detection accuracy, the calibration of confidence level, and the perceptual cues people rely on when viewing in-the-wild videos; and (3) the verification actions people take following suspicion. We find that while the public views AI-manipulated media as prevalent, participants struggled to distinguish authentic and AI-manipulated videos, often exhibiting poorly calibrated confidence. Furthermore, users rarely utilize available detection tools. These patterns highlight the insufficiency of human detection ability and the need of new approaches to enable improved user awareness, successful interventions, and effective mitigation.
AI-based tools that mediate, enhance or generate parts of video communication may interfere with how people evaluate trustworthiness and credibility. In two preregistered online experiments (N = 2,000), we examined whether AI-mediated video retouching, background replacement and avatars affect interpersonal trust, people's ability to detect lies and confidence in their judgments. Participants watched short videos of speakers making truthful or deceptive statements across three conditions with varying levels of AI mediation. We observed that perceived trust and confidence in judgments declined in AI-mediated videos, particularly in settings in which some participants used avatars while others did not. However, participants' actual judgment accuracy remained unchanged, and they were no more inclined to suspect those using AI tools of lying. Our findings provide evidence against concerns that AI mediation undermines people's ability to distinguish truth from lies, and against cue-based accounts of lie detection more generally. They highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.