Human-AI Collaboration

https://dl.acm.org/doi/10.1145/3706598.3713319

Artificial intelligence (AI)-based decision support systems hold promise for enhancing diagnostic accuracy and efficiency in computational pathology. However, human-AI collaboration can introduce and amplify cognitive biases, like confirmation bias caused by false confirmation when erroneous human opinions are reinforced by inaccurate AI output. This bias may increase under time pressure, a ubiquitous factor in routine pathology, as it strains practitioners' cognitive resources. We quantified confirmation bias triggered by AI-induced false confirmation and examined the role of time constraints in a web-based experiment, where trained pathology experts (n=28) estimated tumor cell percentages. Our results suggest that AI integration fuels confirmation bias, evidenced by a statistically significant positive linear-mixed-effects model coefficient linking AI recommendations mirroring flawed human judgment and alignment with system advice. Conversely, time pressure appeared to weaken this relationship. These findings highlight potential risks of AI in healthcare and aim to support the safe integration of clinical decision support systems.

Technische Hochschule Ingolstadt, Ingolstadt, Germany

University of Hohenheim, Stuttgart, Germany

Katholische Universität Eichstätt, Eichstätt, Germany

Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany

Technische Hochschule Ingolstadt, Ingolstadt, Germany

Freie Universität Berlin, Berlin, Germany

Animal Medical Center, New York, New York, United States

University of Veterinary Medicine Vienna, Vienna, Austria

Medical University of Vienna, Vienna, Germany

Ross University School of Veterinary Medicine, Basseterre, Saint Kitts and Nevis

University of Milan, Milan, Italy

Ludwig-Maximilians-University of Munich, Munich, Germany

Michigan State University, East Lansing, Michigan, United States

Ross University School of Veterinary Medicine, Basseterre, Saint Kitts and Nevis

Faculty of Veterinary Medicine, Universiti Putra Malaysia, Serdang, Malaysia

Freie Universität Berlin, Berlin, Germany

UMC Utrecht, Utrecht, Netherlands

Michigan State University, East Lansing, Michigan, United States

NOIVBD, Vessem, Netherlands

Michigan State University, East Lansing, Michigan, United States

UMC Utrecht, Utrecht, Netherlands

University of Hohenheim, Stuttgart, Germany

University of Veterinary Medicine Vienna, Vienna, Austria

Technische Hochschule Ingolstadt, Ingolstadt, Bavaria, Germany

Flensburg University of Applied Sciences, Flensburg, Germany

10.1145/3706598.3713319

https://dl.acm.org/doi/10.1145/3706598.3713946

As artificial intelligence (AI) continues to transform the modern workplace, generative AI (GenAI) has emerged as a prominent tool capable of augmenting work processes. Defined by its ability to create or modify content, GenAI differs significantly from traditional machine learning models that classify, recognize, or predict patterns from existing data. This study explores the role of GenAI in shaping perceptions of AI’s contribution and how these perceptions influence both creators’ internal assessments of their work and their anticipation of external evaluators’ assessments. Our research develops and empirically tests a structural model through a between-subjects experiment, revealing that the role GenAI plays in the work process significantly impacts perceived enhancements in work quality and effort relative to human input. Additionally, we identify a critical trade-off between fostering worker assessments of creativity and managing perceived external assessments of the work’s value.

University of Georgia, Athens, Georgia, United States

10.1145/3706598.3713946

https://dl.acm.org/doi/10.1145/3706598.3713708

An important challenge in interactive machine learning, particularly in subjective or ambiguous domains, is fostering bi-directional alignment between humans and models. Users teach models their concept definition through data labeling, while refining their own understandings throughout the process. To facilitate this, we introduce MOCHA, an interactive machine learning tool informed by two theories of human concept learning and cognition. First, it utilizes a neuro-symbolic pipeline to support Variation Theory-based counterfactual data generation. By asking users to annotate counterexamples that are syntactically and semantically similar to already-annotated data but predicted to have different labels, the system can learn more effectively while helping users understand the model and reflect on their own label definitions. Second, MOCHA uses Structural Alignment Theory to present groups of counterexamples, helping users comprehend alignable differences between data items and annotate them in batch. We validated MOCHA's effectiveness and usability through a lab study with 18 participants.

University of Notre Dame, Notre Dame, Indiana, United States

Harvard University, Allston, Massachusetts, United States

University of Notre Dame, Notre Dame, Indiana, United States

10.1145/3706598.3713708

https://dl.acm.org/doi/10.1145/3706598.3713861

Despite Generative AI (GenAI) systems' potential for enhancing content creation, users often struggle to effectively integrate GenAI into their creative workflows. Core challenges include misalignment of AI-generated content with user intentions (intent elicitation and alignment), user uncertainty around how to best communicate their intents to the AI system (prompt formulation), and insufficient flexibility of AI systems to support diverse creative workflows (workflow flexibility). Motivated by these challenges, we created IntentTagger: a system for slide creation based on the notion of Intent Tags—small, atomic conceptual units that encapsulate user intent—for exploring granular and non-linear micro-prompting interactions for Human-GenAI co-creation workflows. Our user study with 12 participants provides insights into the value of flexibly expressing intent across varying levels of ambiguity, meta-intent elicitation, and the benefits and challenges of intent tag-driven workflows. We conclude by discussing the broader implications of our findings and design considerations for GenAI-supported content creation workflows.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft, Seattle, Washington, United States

Microsoft Research, Redmond, Washington, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Microsoft Research, Redmond, Washington, United States

10.1145/3706598.3713861

https://dl.acm.org/doi/10.1145/3706598.3713248

In subjective decision-making, where decisions are based on contextual interpretation, Large Language Models (LLMs) can be integrated to present users with additional rationales to consider. The diversity of these rationales is mediated by the ability to consider the perspectives of different social actors; however, it remains unclear whether and how models differ in the distribution of perspectives they provide. We compare the perspectives taken by humans and different LLMs when assessing subtle sexism scenarios. We show that these perspectives can be classified within a finite set (perpetrator, victim, decision-maker), consistently present in argumentations produced by humans and LLMs, but in different distributions and combinations, demonstrating differences and similarities with human responses, and between models. We argue for the need to systematically evaluate LLMs’ perspective-taking to identify the most suitable models for a given decision-making task. We discuss the implications for model evaluation.

University of Toronto, Toronto, Ontario, Canada

NAVER AI Lab, Seongnam, Gyeonggi, Korea, Republic of

University of Toronto, Toronto, Ontario, Canada

10.1145/3706598.3713248