AI-Assisted Clinical Diagnosis and Reasoning | CHI 2026 | Paper Guilds (ペーパーギルド)

CHI 2026

Technology, Safety and Justice

AI, Learning and Inclusion in Education

Multidisciplinary tumor boards (MTBs) bring specialists together to identify therapies for complex cancer cases, but preparing for them is time-intensive. Clinicians must extract key details from extensive records and evaluate treatment options. While large language models (LLMs) show promise in medicine for basic tasks like summarizing notes, little is known about their role in high-stakes tasks like MTB preparation. We conducted a mixed-methods study with 16 oncologists using two AI systems to prepare patient cases for MTB: an off-the-shelf assistant (Copilot) and a task-specific multi-agent system (Healthcare Agent Orchestrator, HAO). We analyzed oncologist prompts, AI responses, and oncologists' perception of AI. Participants showed greater willingness to adopt HAO but were often overconfident in AI summaries and skeptical of AI-recommended therapies. Trust calibration strategies, such as source links and agent-trajectories, failed to align trust with system capabilities. We conclude with how AI systems should be built to support clinicians in high-stakes tasks.

Northeastern University, Boston, Massachusetts, United States

Microsoft Research, Redmond, Washington, United States

University of Washington, Seattle, Washington, United States

Stanford University School of Medicine, Stanford , California, United States

Stanford University, Stanford, California, United States

Microsoft, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Stanford University, Stanford, California, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Northeastern University, Boston, Massachusetts, United States

Northeastern University, Boston, Massachusetts, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Nuance, Palo Alto, California, United States

Microsoft, Redmond, Washington, United States

お気に入り

あとで読む

コレクション

Large language models (LLMs) are being increasingly deployed in healthcare, influencing diagnostic reasoning and clinical workflows. However, evidence of clinician engagement with these systems, how they prompt, constrain, and verify output, remains scarce, particularly in low- and middle-income countries (LMICs). We conducted a mixed-methods study with physicians in Pakistan: (1) logging their interactions while they solved expert-designed clinical vignettes with optional LLM assistance, and (2) interviewing 12 participants about generative-AI-supported diagnosis. Findings highlight diverse prompting strategies from role assignment to cautious scaffolding, with consistent insistence on human oversight. Interviews reveal pragmatic enthusiasm for LLMs as a “second brain” in resource-constrained settings, tempered by skepticism about reliability, privacy, and patient trust. This study contributes evidence of physician-LLM interaction patterns in an LMIC context, a taxonomy of prompting strategies and oversight mechanisms, and design implications for responsible AI integration in healthcare workflows.

LUMS, Lahore, Pakistan

LUMS, Lahore, Pakistan

LUMS, Lahore, Pakistan

LUMS, Lahore, Pakistan

LUMS, Lahore, Pakistan

LUMS, Lahore, Pakistan

LUMS, Lahore, Punjab, Pakistan

LUMS, Lahore, Pakistan

お気に入り

あとで読む

コレクション

The global shortage and uneven distribution of medical expertise continue to hinder equitable access to accurate diagnostic care. While existing intelligent diagnostic system have shown promise, most struggle with dual-user interaction, and dynamic knowledge integration—limiting their real-world applicability. In this study, we present DiagLink, a dual-user diagnostic assistance system that synergizes large language models (LLMs), knowledge graphs (KGs), and medical experts to support both patients and physicians. DiagLink uses guided dialogues to elicit patient histories, leverages LLMs and KGs for collaborative reasoning, and incorporates physician oversight for continuous knowledge validation and evolution. The system provides a role-adaptive interface, dynamically visualized history, and unified multi-source evidence to improve both trust and usability. We evaluate DiagLink through user study, use cases and expert interviews, demonstrating its effectiveness in improving user satisfaction and diagnostic efficiency, while offering insights for the design of future AI-assisted diagnostic systems.

Northeastern University, shenyang, China

Northeastern University, Shenyang, China

Northeastern University, China, Shenyang, China

Northeastern University, Shenyang, China

Northeastern University, Shenyang, China

Northeastern University, Shenyang, China

お気に入り

あとで読む

コレクション

Most trust metrics for intelligent systems are developed for adults, relying on complex reasoning and language that do not align with children’s developmental stages. As intelligent systems increasingly engage with young users, evaluating trust in child-AI interaction has become an urgent concern in HCI. In this paper, we present the iterative refinement and validation of the K-AI Trust Questionnaire, a child-centred instrument that integrates dispositional and situational trust components grounded in child-rights principles. Dispositional trust is captured through a child-adapted Propensity to Trust Technology (PTT), while situational trust is assessed through post-interaction items reflecting children's experience with AI. Starting with a sample of 289 children, we conducted psychometric analyses and exploratory testing, culminating in a confirmatory factor analysis on a subsample of 85 children. Results supported a unidimensional structure consistent with the PTT, and highlighted the limitations of adult-oriented scales, underscoring the need for developmentally appropriate tools for trustworthy child-AI design.

University of Bari, Bari, Italy

University of Bari Aldo Moro, Bari, BA, Italy

University of Amsterdam, Amsterdam, Netherlands

University of Bari, Bari, Italy

お気に入り

あとで読む

コレクション

We present LubDubDecoder, a system that enables fine-grained monitoring of micro-cardiac vibrations associated with the opening and closing of heart valves across a range of hearables. Our system transforms the built-in speaker, the only transducer common to all hearables, into an acoustic sensor that captures the coarse "lub-dub" heart sounds, leverages their shared temporal and spectral structure to reconstruct the subtle seismocardiography (SCG) and gyrocardiography (GCG) waveforms, and extract the timing of key micro-cardiac events. In an IRB-approved feasibility study with 25 users, our system achieves correlations of 0.88-0.95 compared to chest-mounted reference measurements in within-user and cross-user evaluations, and generalizes to unseen hearables using a zero-effort adaptation scheme with a correlation of 0.91. Our system is robust across remounting sessions and music playback.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Tsinghua University, Beijing, China

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Shanghai Jiao Tong University, Shanghai, China

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Carnegie Mellon University, PITTSBURGH, Pennsylvania, United States

Tsinghua University, Beijing, China

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

お気に入り

あとで読む

コレクション

Interpreting electrocardiograms (ECGs) is an important but complex and error-prone task. While diagnostic support algorithms exist, how support is displayed and how clinicians interact with ECG diagnostic and clinical decision support systems in general remain underexplored. In this preregistered experiment, we studied how providing clinicians with different versions of diagnostic support affects ECG interpretation. All four support types improved diagnosis accuracy compared to a no-support control condition, but the most effective was support offering visual ECG trace markings. User experience, in the form of psychological need satisfaction of competence and security, was highest when clinicians first viewed the ECG independently and then received support in a second stage. The latter two-stage support also resulted in the shortest diagnosis times. We conclude with design and research implications for creating clinician-algorithmic support interactions that improve user experience, efficacy, and effectiveness in the present study, and may ultimately contribute to patient safety.

Julius-Maximilians-Universität Würzburg, Würzburg, Germany

University Hospital Würzburg, Würzburg, Germany

University Hospital Würzburg, Würzburg, Germany

University Hospital Würzburg, Würzburg, Germany

お気に入り

あとで読む

コレクション

Wearables enable users to generate electrocardiogram (ECG) data and receive algorithmic rhythm interpretations. While cardiologists increasingly use this data, little is known about how point-of-care clinicians perceive and anticipate using it. These clinicians are the main point of contact for many patients and determine access to further investigations and specialists. We conducted vignette-based interviews with 33 primary and emergency care clinicians to explore how they make sense of patient-generated ECG data and which factors shape anticipated use in decision making. We found that patient-generated data introduces diagnostic uncertainty, shaped by: legitimacy concerns, interpretation challenges, the influence of the wider clinical context on trust and confidence, and the balancing of patient risk against professional risk. This duality of risk often overrode earlier considerations, determining how clinicians responded to patient-generated data. We discuss design opportunities for uncertainty and risk-aware technology that can support the adoption of patient-generated data in everyday clinical practice.

University of Bristol , Bristol , England, United Kingdom

University of Bristol, Bristol, United Kingdom

University of Bristol, Bristol, United Kingdom

University of Bristol, Bristol, United Kingdom

お気に入り

あとで読む

コレクション

Technology, Safety and Justice

AI, Learning and Inclusion in Education