AI Explanations and Decision Support in Healthcare

会議の名前
CHI 2026
Metacognitive Demands and Strategies While Using Off-The-Shelf AI Conversational Agents for Health Information Seeking
要旨

As Artificial Intelligence (AI) conversational agents become widespread, people are increasingly using them for health information seeking. The use of off-the-shelf conversational agents for health information seeking could place high metacognitive demands (the need for extensive monitoring and control of one's own thought process) on individuals, which could compromise their experience of seeking health information. However, currently, the specific demands that arise while using conversational agents for health information seeking, and the strategies people use to cope with those demands, remain unknown. To address these gaps, we conducted a think-aloud study with 15 participants as they sought health information using our off-the-shelf AI conversational agent. We identified the metacognitive demands such systems impose, the strategies people adopt in response, and propose considerations for designing beyond off-the-shelf interfaces to reduce these demands and support better user experiences and affordances in health information seeking.

受賞
Honorable Mention
著者
Shri Harini Ramesh
University of Calgary, Calgary, Alberta, Canada
Foroozan Daneshzand
Simon fraser university, Burnaby, British Columbia, Canada
Babak Rashidi
Ottawa General Campus, Ottawa, Ontario, Canada
Shriti Raj
Stanford University , Palo Alto, California, United States
Hariharan Subramonyam
Stanford University, Stanford, California, United States
Fateme Rajabiyazdi
University of Calgary, Calgary, Alberta, Canada
Intelligent Reasoning Cues: A Framework and Case Study of the Roles of AI Information in Complex Decisions
要旨

Artificial intelligence (AI)-based decision support systems can be highly accurate yet still fail to support users or improve decisions. Existing theories of AI-assisted decision-making focus on calibrating reliance on AI advice, leaving it unclear how different system designs might influence the reasoning processes underneath. We address this gap by reconsidering AI interfaces as collections of intelligent reasoning cues: discrete pieces of AI information that can individually influence decision-making. We then explore the roles of eight types of reasoning cues in a high-stakes clinical decision (treating patients with sepsis in intensive care). Through contextual inquiries with six teams and a think-aloud study with 25 physicians, we find that reasoning cues have distinct patterns of influence that can directly inform design. Our results also suggest that reasoning cues should prioritize tasks with high variability and discretion, adapt to ensure compatibility with evolving decision needs, and provide complementary, rigorous insights on complex cases.

著者
Venkatesh Sivaraman
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Eric Paul. Mason
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Mengfan Ellen. Li
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Jessica Tong
Pomona College, Claremont, California, United States
Andrew Joseph King
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Jeremy M.. Kahn
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Adam Perer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Design and Multi-level Evaluation of MAP-X: a Medically Aligned, Patient-Centered AI Explanation System
要旨

Health artificial intelligence (AI) is often developed in high-stakes, data-scarce contexts, where both clinical validity and patient comprehension are critical; however, rigorous, multi-level evaluation of explanations in real-world patient-facing settings remains challenging. To enhance patient understanding and trust, we propose a practical blueprint for designing and evaluating medically aligned, patient-centered explanation (MAP-X). We propose this blueprint through MAP-X, a system that employs a large language model (LLM) with retrieval-augmented generation (RAG) to translate clinical assessments into an understandable interface. We conducted a three-phase evaluation following a multi-level validation framework: a functional evaluation of faithfulness, a clinician evaluation of workflow suitability, and a patient evaluation of perceived understanding and trust. Our findings suggest that MAP-X may support clinical adoption. In the patient study, MAP-X showed higher reported trust and a positive trend in explanation satisfaction. Interviews suggested clearer understanding of assessment results. Overall, MAP-X produced clinically relevant explanations with reasonable faithfulness and usability. Clinician oversight remains necessary.

著者
Yuyoung kim
HAII Corp., Seoul, Korea, Republic of
Minjung Kim
HAII Corp., Seoul, Korea, Republic of
Saebyeol Kim
HAII Corp., Seoul, Korea, Republic of
Sooyoun Cho
HAII Corp., Seoul, Korea, Republic of
Jinwoo Kim
HAII Corp., Seoul, Korea, Republic of
MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard
要旨

Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices.

受賞
Honorable Mention
著者
Ruishi Zou
Columbia University, New York, New York, United States
Shiyu Xu
Columbia University, New York, New York, United States
Margaret E. Morris
University of Washington, Seattle, Seattle, Washington, United States
Jihan Ryu
Hamilton-Madison House, New York, New York, United States
Timothy D. Becker
New York-Presbyterian Hospital/ Weill Cornell Medicine, New York, New York, United States
Nicholas Allen
University of Oregon, Eugene, Oregon, United States
Anne Marie Albano
Columbia University, New York, New York, United States
Randy Auerbach
Columbia University, New York City, New York, United States
Daniel A.. Adler
Cornell University, New York, New York, United States
Varun Mishra
Northeastern University, Boston, Massachusetts, United States
Lace M.. Padilla
Northeastern University, Boston, Massachusetts, United States
Dakuo Wang
Northeastern University, Boston, Massachusetts, United States
Ryan Sultan
Columbia University, New York, New York, United States
Xuhai "Orson" Xu
Columbia University, New York City, New York, United States
動画
Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics
要旨

Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations and LLM-driven diagnostic recommendations. Through a within-subjects counterbalanced study with 16 clinicians across nephrology and obstetrics, we comprehensively evaluated AICare using objective measures (task completion time and error rate), subjective assessments (NASA-TLX, SUS, and confidence ratings), and semi-structured interviews. Our findings indicate AICare's reduced cognitive workload. Beyond performance metrics, qualitative analysis reveals that trust is actively constructed through verification, with interaction strategies diverging by expertise: junior clinicians used the system as cognitive scaffolding to structure their analysis, while experts engaged in adversarial verification to challenge the AI's logic. This work offers design implications for creating AI systems that function as transparent partners, accommodating diverse reasoning styles to augment rather than replace clinical judgment.

著者
Yinghao Zhu
Peking University, Beijing, China
Dehao Sui
Peking University, Beijing, China
Zixiang Wang
Peking University, Beijing, China
Xuning Hu
Xi'an Jiaotong-Liverpool University, Suzhou, China
Lei Gu
Peking University, Beijing, China
Yifan Qi
Nankai University, Tianjin, China
Tianchen Wu
Peking University Third Hospital, Beijing, China
Ling Wang
Affiliated Xuzhou Municipal Hospital of Xuzhou Medical University, Jiangsu, China
Yuan Wei
Peking University Third Hospital, Beijing, China
Wen Tang
Peking University, Beijing, China
Zhihan Cui
Peking University, Beijing, China
Yasha Wang
Peking University, Beijing, China
Lequan Yu
The University of Hong Kong, Hong Kong, N/A, China
Ewen M Harrison
The University of Edinburgh, Edinburgh, United Kingdom
Junyi Gao
University of Edinburgh, Edinburgh, United Kingdom
Liantao Ma
Peking University, Beijing, China
動画
Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
要旨

Large Language Models (LLMs) are typically evaluated through general or domain-specific benchmarks testing capabilities that often lack grounding in the lived realities of end users. Critical domains such as healthcare require evaluations that extend beyond artificial or simulated tasks to reflect the everyday needs, cultural practices, and nuanced contexts of communities. We propose Samiksha, a community-driven evaluation pipeline co-created with civil-society organizations (CSOs) and community members. Our approach enables scalable, automated benchmarking through a culturally aware, community-driven pipeline in which community feedback informs what to evaluate, how the benchmark is built, and how outputs are scored. We demonstrate this approach in the health domain in India. Our analysis highlights how current multilingual LLMs address nuanced community health queries, while also offering a scalable pathway for contextually grounded and inclusive LLM evaluation.

受賞
Honorable Mention
著者
Hamna Hamna
Microsoft Corporation, Bangalore, Karnataka, India
Gayatri Bhat
Karya, Bengaluru, India
Sourabrata Mukherjee
Microsoft Research, Bengaluru, Karnataka, India
Faisal M.. Lalani
Collective Intelligence Project, New York, New York, United States
Evan Hadfield
Collective Intelligence Project, New York, New York, United States
Divya Siddarth
Collective Intelligence Project, New York, New York, United States
Kalika Bali
Microsoft Research Lab India, Bangalore, India
Sunayana Sitaram
Microsoft Research India, Bangalore, Karnataka, India
動画
Constructing Everyday Well-Being: Insights from God-Saeng (God生) for Personal Informatics
要旨

While Personal Informatics (PI) systems support behavior change, everyday well-being involves more than achieving individual target behaviors. It is shaped by cultural narratives that give actions meaning. In South Korea, the God-Saeng (God生) phenomenon—encompassing disciplined, collective, and publicly documented self-improvement practices—offers a lens into how well-being is negotiated in daily life. We conducted a 10-day probe (N=24) with bite-sized missions to examine how young adults engaged in God-Saeng. Participants relied on planning practices, accountability infrastructures, and datafication to stabilize themselves, yet these same routines also intensified pressures toward self-monitoring and performance. They navigated tensions between consistency and flexibility, authenticity and visibility, and productivity and broader values such as relationships, and reinterpreted ordinary activities through sociocultural contexts. These insights suggest design opportunities for PI systems that move beyond tracking, toward digital instruments that help users negotiate tensions, make meaning, and reflexively understand how technologies participate in their culturally and existentially situated well-being.

著者
Inhwa Song
Princeton University, Princeton, New Jersey, United States
Kwangyoung Lee
KAIST, Daejeon, Korea, Republic of
Janghee Cho
National University of Singapore, Singapore, Singapore
Amon Rapp
University of Turin, Torino, Italy
Hwajung Hong
KAIST, Deajeon, Korea, Republic of