“Do I Trust the AI?” Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Clinical Reasoning

Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians' difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human–AI collaboration has rarely examined physicians' perceptions of LLMs' clinical reasoning capability. In this work, we investigate how physicians perceive LLMs' capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.

ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, Shanghai, China

ShanghaiTech University, Shanghai, China

Shanghai Clinical Research and Trials Center ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, Shanghai, China

ACM CHI Conference on Human Factors in Computing Systems

Area 1 + 2 + 3: theatre

7 件の発表

開始日時2026-04-17 18:00:00

終了日時2026-04-17 19:30:00

お気に入り

あとで読む