AI-Assisted Clinical Diagnosis and Reasoning

会議の名前
CHI 2026
Exploring the Future of AI in Clinical Collaboration: A Study on Tumor Board Case Preparation
要旨

Multidisciplinary tumor boards (MTBs) bring specialists together to identify therapies for complex cancer cases, but preparing for them is time-intensive. Clinicians must extract key details from extensive records and evaluate treatment options. While large language models (LLMs) show promise in medicine for basic tasks like summarizing notes, little is known about their role in high-stakes tasks like MTB preparation. We conducted a mixed-methods study with 16 oncologists using two AI systems to prepare patient cases for MTB: an off-the-shelf assistant (Copilot) and a task-specific multi-agent system (Healthcare Agent Orchestrator, HAO). We analyzed oncologist prompts, AI responses, and oncologists' perception of AI. Participants showed greater willingness to adopt HAO but were often overconfident in AI summaries and skeptical of AI-recommended therapies. Trust calibration strategies, such as source links and agent-trajectories, failed to align trust with system capabilities. We conclude with how AI systems should be built to support clinicians in high-stakes tasks.

受賞
Honorable Mention
著者
Jiachen Li
Northeastern University, Boston, Massachusetts, United States
Amanda K.. Hall
Microsoft Research, Redmond, Washington, United States
Ruican Zhong
University of Washington, Seattle, Washington, United States
Selin Everett
Stanford University School of Medicine, Stanford , California, United States
Alyssa Unell
Stanford University, Stanford, California, United States
Hanwen Xu
Microsoft, Redmond, Washington, United States
Matthias Blondeel
Microsoft, Redmond, Washington, United States
Jonathan Carlson
Microsoft Research, Redmond, Washington, United States
Katie Claveau
Microsoft Research, Redmond, Washington, United States
Thulasee Jose
Stanford University, Stanford, California, United States
Tristan Naumann
Microsoft Research, Redmond, Washington, United States
David C.. Rhew
Microsoft, Redmond, Washington, United States
Naiteek Sangani
Microsoft, Redmond, Washington, United States
Frank Tuan
Microsoft, Redmond, Washington, United States
Jim Weinstein
Microsoft Research, Redmond, Washington, United States
Varun Mishra
Northeastern University, Boston, Massachusetts, United States
Elizabeth D. Mynatt
Northeastern University, Boston, Massachusetts, United States
Scott Saponas
Microsoft Research, Redmond, Washington, United States
Hao Qiu
Microsoft, Redmond, Washington, United States
Leonardo Schettini
Microsoft, Redmond, Washington, United States
Sam Preston
Microsoft, Redmond, Washington, United States
Aiden Gu
Microsoft Research, Redmond, Washington, United States
Naoto Usuyama
Microsoft Research, Redmond, Washington, United States
Zelalem Gero
Microsoft Research, Redmond, Washington, United States
Cliff Wong
Microsoft Research, Redmond, Washington, United States
Noel Christopher. Codella
Microsoft, Redmond, Washington, United States
Hoifung Poon
Microsoft Research, Redmond, Washington, United States
Shrey Jain
Microsoft, Redmond, Washington, United States
Matthew Lungren
Microsoft Nuance, Palo Alto, California, United States
Eric Horvitz
Microsoft, Redmond, Washington, United States
Prompting, Oversight, and Adoption: Physicians’ Use of Large Language Models for Diagnostic Reasoning in an LMIC
要旨

Large language models (LLMs) are being increasingly deployed in healthcare, influencing diagnostic reasoning and clinical workflows. However, evidence of clinician engagement with these systems, how they prompt, constrain, and verify output, remains scarce, particularly in low- and middle-income countries (LMICs). We conducted a mixed-methods study with physicians in Pakistan: (1) logging their interactions while they solved expert-designed clinical vignettes with optional LLM assistance, and (2) interviewing 12 participants about generative-AI-supported diagnosis. Findings highlight diverse prompting strategies from role assignment to cautious scaffolding, with consistent insistence on human oversight. Interviews reveal pragmatic enthusiasm for LLMs as a “second brain” in resource-constrained settings, tempered by skepticism about reliability, privacy, and patient trust. This study contributes evidence of physician-LLM interaction patterns in an LMIC context, a taxonomy of prompting strategies and oversight mechanisms, and design implications for responsible AI integration in healthcare workflows.

著者
Ushna Malik
LUMS, Lahore, Pakistan
Laiba Intizar Ahmad
LUMS, Lahore, Pakistan
Amna Hassan
LUMS, Lahore, Pakistan
Izzah Shafique
LUMS, Lahore, Pakistan
Eilya Mohsin
LUMS, Lahore, Pakistan
Ayesha Ali
LUMS, Lahore, Pakistan
Muhammad Hamad Alizai
LUMS, Lahore, Punjab, Pakistan
Ihsan Ayyub Qazi
LUMS, Lahore, Pakistan
DiagLink: A Dual-User Diagnostic Assistance System by Synergizing Experts with LLMs and Knowledge Graphs
要旨

The global shortage and uneven distribution of medical expertise continue to hinder equitable access to accurate diagnostic care. While existing intelligent diagnostic system have shown promise, most struggle with dual-user interaction, and dynamic knowledge integration—limiting their real-world applicability. In this study, we present DiagLink, a dual-user diagnostic assistance system that synergizes large language models (LLMs), knowledge graphs (KGs), and medical experts to support both patients and physicians. DiagLink uses guided dialogues to elicit patient histories, leverages LLMs and KGs for collaborative reasoning, and incorporates physician oversight for continuous knowledge validation and evolution. The system provides a role-adaptive interface, dynamically visualized history, and unified multi-source evidence to improve both trust and usability. We evaluate DiagLink through user study, use cases and expert interviews, demonstrating its effectiveness in improving user satisfaction and diagnostic efficiency, while offering insights for the design of future AI-assisted diagnostic systems.

著者
Zihan Zhou
Northeastern University, shenyang, China
Yinan Liu
Northeastern University, Shenyang, China
Yuyang Xie
Northeastern University, China, Shenyang, China
Bin Wang
Northeastern University, Shenyang, China
Xiaochun Yang
Northeastern University, Shenyang, China
Zezheng Feng
Northeastern University, Shenyang, China
Do Children Trust AI, and Should They? Designing and Validating a Child-Centred K-AI Trust Scale for Intelligent Systems
要旨

Most trust metrics for intelligent systems are developed for adults, relying on complex reasoning and language that do not align with children’s developmental stages. As intelligent systems increasingly engage with young users, evaluating trust in child-AI interaction has become an urgent concern in HCI. In this paper, we present the iterative refinement and validation of the K-AI Trust Questionnaire, a child-centred instrument that integrates dispositional and situational trust components grounded in child-rights principles. Dispositional trust is captured through a child-adapted Propensity to Trust Technology (PTT), while situational trust is assessed through post-interaction items reflecting children's experience with AI. Starting with a sample of 289 children, we conducted psychometric analyses and exploratory testing, culminating in a confirmatory factor analysis on a subsample of 85 children. Results supported a unidimensional structure consistent with the PTT, and highlighted the limitations of adult-oriented scales, underscoring the need for developmentally appropriate tools for trustworthy child-AI design.

受賞
Best Paper
著者
Grazia Ragone
University of Bari, Bari, Italy
Paolo Buono
University of Bari Aldo Moro, Bari, BA, Italy
Judith Good
University of Amsterdam, Amsterdam, Netherlands
Rosa Lanzilotti
University of Bari, Bari, Italy
動画
LubDubDecoder: Bringing Micro-Mechanical Cardiac Monitoring to Hearables
要旨

We present LubDubDecoder, a system that enables fine-grained monitoring of micro-cardiac vibrations associated with the opening and closing of heart valves across a range of hearables. Our system transforms the built-in speaker, the only transducer common to all hearables, into an acoustic sensor that captures the coarse "lub-dub" heart sounds, leverages their shared temporal and spectral structure to reconstruct the subtle seismocardiography (SCG) and gyrocardiography (GCG) waveforms, and extract the timing of key micro-cardiac events. In an IRB-approved feasibility study with 25 users, our system achieves correlations of 0.88-0.95 compared to chest-mounted reference measurements in within-user and cross-user evaluations, and generalizes to unseen hearables using a zero-effort adaptation scheme with a correlation of 0.91. Our system is robust across remounting sessions and music playback.

著者
Siqi Zhang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Xiyuxing Zhang
Tsinghua University, Beijing, China
Duc Nguyen Tien. Vu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Tao Qiang
Shanghai Jiao Tong University, Shanghai, China
Clara Palacios
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Jiangyifei Zhu
Carnegie Mellon University, PITTSBURGH, Pennsylvania, United States
Yuntao Wang
Tsinghua University, Beijing, China
Mayank Goel
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Justin Chan
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
AI-Supported Electrocardiogram Interpretation: The Effect of Support Presentation on Diagnostic Accuracy, Psychological Need Satisfaction, and Diagnosis Time
要旨

Interpreting electrocardiograms (ECGs) is an important but complex and error-prone task. While diagnostic support algorithms exist, how support is displayed and how clinicians interact with ECG diagnostic and clinical decision support systems in general remain underexplored. In this preregistered experiment, we studied how providing clinicians with different versions of diagnostic support affects ECG interpretation. All four support types improved diagnosis accuracy compared to a no-support control condition, but the most effective was support offering visual ECG trace markings. User experience, in the form of psychological need satisfaction of competence and security, was highest when clinicians first viewed the ECG independently and then received support in a second stage. The latter two-stage support also resulted in the shortest diagnosis times. We conclude with design and research implications for creating clinician-algorithmic support interactions that improve user experience, efficacy, and effectiveness in the present study, and may ultimately contribute to patient safety.

著者
Tobias Grundgeiger
Julius-Maximilians-Universität Würzburg, Würzburg, Germany
Louisa Maurer
University Hospital Würzburg, Würzburg, Germany
Carlos Ramon. Hölzing
University Hospital Würzburg, Würzburg, Germany
Oliver Happel
University Hospital Würzburg, Würzburg, Germany
Uncertainty and Risk at the Point of Care: Implications of Patient-Generated ECGs and Algorithmic Interpretations for Clinical Decision Making
要旨

Wearables enable users to generate electrocardiogram (ECG) data and receive algorithmic rhythm interpretations. While cardiologists increasingly use this data, little is known about how point-of-care clinicians perceive and anticipate using it. These clinicians are the main point of contact for many patients and determine access to further investigations and specialists. We conducted vignette-based interviews with 33 primary and emergency care clinicians to explore how they make sense of patient-generated ECG data and which factors shape anticipated use in decision making. We found that patient-generated data introduces diagnostic uncertainty, shaped by: legitimacy concerns, interpretation challenges, the influence of the wider clinical context on trust and confidence, and the balancing of patient risk against professional risk. This duality of risk often overrode earlier considerations, determining how clinicians responded to patient-generated data. We discuss design opportunities for uncertainty and risk-aware technology that can support the adoption of patient-generated data in everyday clinical practice.

著者
Rachel Keys
University of Bristol , Bristol , England, United Kingdom
Aisling Ann O'Kane
University of Bristol, Bristol, United Kingdom
Paul Marshall
University of Bristol, Bristol, United Kingdom
Graham Stuart
University of Bristol, Bristol, United Kingdom