AI Explanations and Decision Support in Healthcare

Paper Guilds

/

会議の一覧

/

CHI 2026

/

AI Explanations and Decision Support in Healthcare

CHI 2026

AI & Timing Matters

AI Governance and Accountability

As Artificial Intelligence (AI) conversational agents become widespread, people are increasingly using them for health information seeking. The use of off-the-shelf conversational agents for health information seeking could place high metacognitive demands (the need for extensive monitoring and control of one's own thought process) on individuals, which could compromise their experience of seeking health information. However, currently, the specific demands that arise while using conversational agents for health information seeking, and the strategies people use to cope with those demands, remain unknown. To address these gaps, we conducted a think-aloud study with 15 participants as they sought health information using our off-the-shelf AI conversational agent. We identified the metacognitive demands such systems impose, the strategies people adopt in response, and propose considerations for designing beyond off-the-shelf interfaces to reduce these demands and support better user experiences and affordances in health information seeking.

University of Calgary, Calgary, Alberta, Canada

Simon fraser university, Burnaby, British Columbia, Canada

Ottawa General Campus, Ottawa, Ontario, Canada

Stanford University , Palo Alto, California, United States

Stanford University, Stanford, California, United States

University of Calgary, Calgary, Alberta, Canada

お気に入り

あとで読む

コレクション

Artificial intelligence (AI)-based decision support systems can be highly accurate yet still fail to support users or improve decisions. Existing theories of AI-assisted decision-making focus on calibrating reliance on AI advice, leaving it unclear how different system designs might influence the reasoning processes underneath. We address this gap by reconsidering AI interfaces as collections of intelligent reasoning cues: discrete pieces of AI information that can individually influence decision-making. We then explore the roles of eight types of reasoning cues in a high-stakes clinical decision (treating patients with sepsis in intensive care). Through contextual inquiries with six teams and a think-aloud study with 25 physicians, we find that reasoning cues have distinct patterns of influence that can directly inform design. Our results also suggest that reasoning cues should prioritize tasks with high variability and discretion, adapt to ensure compatibility with evolving decision needs, and provide complementary, rigorous insights on complex cases.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

University of Pittsburgh, Pittsburgh, Pennsylvania, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Pomona College, Claremont, California, United States

University of Pittsburgh, Pittsburgh, Pennsylvania, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

お気に入り

あとで読む

コレクション

Health artificial intelligence (AI) is often developed in high-stakes, data-scarce contexts, where both clinical validity and patient comprehension are critical; however, rigorous, multi-level evaluation of explanations in real-world patient-facing settings remains challenging. To enhance patient understanding and trust, we propose a practical blueprint for designing and evaluating medically aligned, patient-centered explanation (MAP-X). We propose this blueprint through MAP-X, a system that employs a large language model (LLM) with retrieval-augmented generation (RAG) to translate clinical assessments into an understandable interface. We conducted a three-phase evaluation following a multi-level validation framework: a functional evaluation of faithfulness, a clinician evaluation of workflow suitability, and a patient evaluation of perceived understanding and trust. Our findings suggest that MAP-X may support clinical adoption. In the patient study, MAP-X showed higher reported trust and a positive trend in explanation satisfaction. Interviews suggested clearer understanding of assessment results. Overall, MAP-X produced clinically relevant explanations with reasonable faithfulness and usability. Clinician oversight remains necessary.

HAII Corp., Seoul, Korea, Republic of

お気に入り

あとで読む

コレクション

Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices.

Columbia University, New York, New York, United States

University of Washington, Seattle, Seattle, Washington, United States

Hamilton-Madison House, New York, New York, United States

New York-Presbyterian Hospital/ Weill Cornell Medicine, New York, New York, United States

University of Oregon, Eugene, Oregon, United States

Columbia University, New York, New York, United States

Columbia University, New York City, New York, United States

Cornell University, New York, New York, United States

Northeastern University, Boston, Massachusetts, United States

Columbia University, New York, New York, United States

Columbia University, New York City, New York, United States

お気に入り

あとで読む

コレクション

Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations and LLM-driven diagnostic recommendations. Through a within-subjects counterbalanced study with 16 clinicians across nephrology and obstetrics, we comprehensively evaluated AICare using objective measures (task completion time and error rate), subjective assessments (NASA-TLX, SUS, and confidence ratings), and semi-structured interviews. Our findings indicate AICare's reduced cognitive workload. Beyond performance metrics, qualitative analysis reveals that trust is actively constructed through verification, with interaction strategies diverging by expertise: junior clinicians used the system as cognitive scaffolding to structure their analysis, while experts engaged in adversarial verification to challenge the AI's logic. This work offers design implications for creating AI systems that function as transparent partners, accommodating diverse reasoning styles to augment rather than replace clinical judgment.

Peking University, Beijing, China

Xi'an Jiaotong-Liverpool University, Suzhou, China

Peking University, Beijing, China

Nankai University, Tianjin, China

Peking University Third Hospital, Beijing, China

Affiliated Xuzhou Municipal Hospital of Xuzhou Medical University, Jiangsu, China

Peking University Third Hospital, Beijing, China

Peking University, Beijing, China

The University of Hong Kong, Hong Kong, N/A, China

The University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom

Peking University, Beijing, China

お気に入り

あとで読む

コレクション

Large Language Models (LLMs) are typically evaluated through general or domain-specific benchmarks testing capabilities that often lack grounding in the lived realities of end users. Critical domains such as healthcare require evaluations that extend beyond artificial or simulated tasks to reflect the everyday needs, cultural practices, and nuanced contexts of communities. We propose Samiksha, a community-driven evaluation pipeline co-created with civil-society organizations (CSOs) and community members. Our approach enables scalable, automated benchmarking through a culturally aware, community-driven pipeline in which community feedback informs what to evaluate, how the benchmark is built, and how outputs are scored. We demonstrate this approach in the health domain in India. Our analysis highlights how current multilingual LLMs address nuanced community health queries, while also offering a scalable pathway for contextually grounded and inclusive LLM evaluation.

Microsoft Corporation, Bangalore, Karnataka, India

Karya, Bengaluru, India

Microsoft Research, Bengaluru, Karnataka, India

Collective Intelligence Project, New York, New York, United States

Microsoft Research Lab India, Bangalore, India

Microsoft Research India, Bangalore, Karnataka, India

お気に入り

あとで読む

コレクション

While Personal Informatics (PI) systems support behavior change, everyday well-being involves more than achieving individual target behaviors. It is shaped by cultural narratives that give actions meaning. In South Korea, the God-Saeng (God生) phenomenon—encompassing disciplined, collective, and publicly documented self-improvement practices—offers a lens into how well-being is negotiated in daily life. We conducted a 10-day probe (N=24) with bite-sized missions to examine how young adults engaged in God-Saeng. Participants relied on planning practices, accountability infrastructures, and datafication to stabilize themselves, yet these same routines also intensified pressures toward self-monitoring and performance. They navigated tensions between consistency and flexibility, authenticity and visibility, and productivity and broader values such as relationships, and reinterpreted ordinary activities through sociocultural contexts. These insights suggest design opportunities for PI systems that move beyond tracking, toward digital instruments that help users negotiate tensions, make meaning, and reflexively understand how technologies participate in their culturally and existentially situated well-being.

Princeton University, Princeton, New Jersey, United States

KAIST, Daejeon, Korea, Republic of

National University of Singapore, Singapore, Singapore

University of Turin, Torino, Italy

KAIST, Deajeon, Korea, Republic of

AI Governance and Accountability

要旨

受賞
Honorable Mention

著者

要旨

著者

要旨

著者

要旨

受賞
Honorable Mention

著者

動画

要旨

著者

動画

要旨

受賞
Honorable Mention

著者

動画

要旨

著者