Sensemaking with AI A

会議の名前
CHI 2024
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models
要旨

Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect to various criteria. Prior research and our formative study found that people would benefit from reading an overview of an information space upfront, including the criteria others previously found useful. However, existing sensemaking tools struggle with the "cold-start" problem -- not only requiring significant input from previous users to generate and share these overviews, but also that such overviews may turn out to be biased and incomplete. In this work, we introduce a novel system, Selenite, which leverages Large Language Models (LLMs) as reasoning machines and knowledge retrievers to automatically produce a comprehensive overview of options and criteria to jumpstart users' sensemaking processes. Subsequently, Selenite also adapts as people use it, helping users find, read, and navigate unfamiliar information in a systematic yet personalized manner. Through three studies, we found that Selenite produced accurate and high-quality overviews reliably, significantly accelerated users' information processing, and effectively improved their overall comprehension and sensemaking experience.

著者
Michael Xieyang Liu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Tongshuang Wu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Tianying Chen
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Franklin Mingzhe Li
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Aniket Kittur
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Brad A. Myers
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
論文URL

https://doi.org/10.1145/3613904.3642149

動画
Supporting Sensemaking of Large Language Model Outputs at Scale
要旨

Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual documents, as well as how to render their outputs. We report on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. We find that the features support a wide variety of sensemaking tasks and even make tasks tractable that our participants previously considered to be too difficult to attempt. Finally, we present design guidelines to inform future explorations of new LLM interfaces.

受賞
Honorable Mention
著者
Katy Ilonka. Gero
Harvard University, Cambridge, Massachusetts, United States
Chelse Swoopes
Harvard University, Cambridge, Massachusetts, United States
Ziwei Gu
Harvard University, Cambridge, Massachusetts, United States
Jonathan K.. Kummerfeld
The University of Sydney, Sydney, NSW, Australia
Elena L.. Glassman
Harvard University, Allston, Massachusetts, United States
論文URL

https://doi.org/10.1145/3613904.3642139

動画
Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making
要旨

In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness.

受賞
Honorable Mention
著者
Jakob Schoeffer
University of Texas at Austin, Austin, Texas, United States
Maria De-Arteaga
The University of Texas at Austin, Austin, Texas, United States
Niklas Kühl
University of Bayreuth, Bayreuth, Germany
論文URL

https://doi.org/10.1145/3613904.3642621

動画
Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models
要旨

The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into artificial intelligence (AI) models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.

著者
Marvin Pafla
University of Waterloo, Waterloo, Ontario, Canada
Kate Larson
University of Waterloo, Waterloo, Ontario, Canada
Mark Hancock
University of Waterloo, Waterloo, Ontario, Canada
論文URL

https://doi.org/10.1145/3613904.3642934

動画
"Are You Really Sure?'' Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making
要旨

In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces.

著者
Shuai Ma
The Hong Kong University of Science and Technology, Hong Kong, China
Xinru Wang
Purdue University, West Lafayette, Indiana, United States
Ying Lei
East China Normal University, Shanghai, China
Chuhan Shi
Southeast University, Nanjing, China
Ming Yin
Purdue University, West Lafayette, Indiana, United States
Xiaojuan Ma
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
論文URL

https://doi.org/10.1145/3613904.3642671

動画