Artificial intelligence (AI) in healthcare has the potential to improve patient outcomes, but clinician acceptance remains a critical barrier. We developed a novel decision support interface that provides interpretable treatment recommendations for sepsis, a life-threatening condition in which decisional uncertainty is common, treatment practices vary widely, and poor outcomes can occur even with optimal decisions. This system formed the basis of a mixed-methods study in which 24 intensive care clinicians made AI-assisted decisions on real patient cases. We found that explanations generally increased confidence in the AI, but concordance with specific recommendations varied beyond the binary acceptance or rejection described in prior work. Although clinicians sometimes ignored or trusted the AI, they also often prioritized aspects of the recommendations to follow, reject, or delay in a process we term "negotiation." These results reveal novel barriers to adoption of treatment-focused AI tools and suggest ways to better support differing clinician perspectives.
https://doi.org/10.1145/3544548.3581075
Trust has been recognized as a central variable to explain the resistance to using automated systems (under-trust) and the overreliance on automated systems (over-trust). To achieve appropriate reliance, users’ trust should be calibrated to refect a system’s capabilities. Studies from various disciplines have examined diferent interventions to attain such trust calibration. Based on a literature body of 1000+ papers, we identifed 96 relevant publications which aimed to calibrate users’ trust in automated systems. To provide an in-depth overview of the state-of-the-art, we reviewed and summarized measurements of the trust calibration, interventions, and results of these eforts. For the numerous promising calibration interventions, we extract common design choices and structure these into four dimensions of trust calibration interventions to guide future studies. Our fndings indicate that the measurement of the trust calibration often limits the interpretation of the efects of diferent interventions. We suggest future directions for this problem.
https://doi.org/10.1145/3544548.3581197
AI explanations have been increasingly used to help people better utilize AI recommendations in AI-assisted decision making. While AI explanations may change over time due to updates of the AI model, little is known about how these changes may affect people’s perceptions and usage of the model. In this paper, we study how varying levels of similarity between the AI explanations before and after a model update affects people’s trust in and satisfaction with the AI model. We conduct randomized human-subject experiments on two decision making contexts where people have different levels of domain knowledge. Our results show that changes in AI expla- nation during the model update do not affect people’s tendency to adopt AI recommendations. However, they may change people’s subjective trust in and satisfaction with the AI model via changing both their perceived model accuracy and perceived consistency of AI explanations with their prior knowledge.
https://doi.org/10.1145/3544548.3581366
In AI-assisted decision-making, it is critical for human decision-makers to know when to trust AI and when to trust themselves. However, prior studies calibrated human trust only based on AI confidence indicating AI's correctness likelihood (CL) but ignored humans' CL, hindering optimal team decision-making. To mitigate this gap, we proposed to promote humans' appropriate trust based on the CL of both sides at a task-instance level. We first modeled humans' CL by approximating their decision-making models and computing their potential performance in similar instances. We demonstrated the feasibility and effectiveness of our model via two preliminary studies. Then, we proposed three CL exploitation strategies to calibrate users' trust explicitly/implicitly in the AI-assisted decision-making process. Results from a between-subjects experiment (N=293) showed that our CL exploitation strategies promoted more appropriate human trust in AI, compared with only using AI confidence. We further provided practical implications for more human-compatible AI-assisted decision-making.
https://doi.org/10.1145/3544548.3581058
Algorithm aversion occurs when humans are reluctant to use algorithms despite their superior performance. Studies show that giving users outcome control by providing agency over how models’ predictions are incorporated into decision-making mitigates algorithm aversion. We study whether algorithm aversion is mitigated by process control, wherein users can decide what input factors and algorithms to use in model training. We conduct a replication study of outcome control, and test novel process control study conditions on Amazon Mechanical Turk (MTurk) and Prolific. Our results partly confirm prior findings on the mitigating effects of outcome control, while also forefronting reproducibility challenges. We find that process control in the form of choosing the training algorithm mitigates algorithm aversion, but changing inputs does not. Furthermore, giving users both outcome and process control does not reduce algorithm aversion more than outcome or process control alone. This study contributes to design considerations around mitigating algorithm aversion.
https://doi.org/10.1145/3544548.3581253
Games that feature multiple players, limited communication, and partial information are particularly challenging for AI agents. In the cooperative card game Hanabi, which possesses all of these attributes, AI agents fail to achieve scores comparable to even first-time human players. Through an observational study of three mixed-skill Hanabi play groups, we identify the techniques used by humans that help to explain their superior performance compared to AI. These concern physical artefact manipulation, coordination play, role establishment, and continual rule negotiation. Our findings extend previous accounts of human performance in Hanabi, which are purely in terms of theory-of-mind reasoning, by revealing more precisely how this form of collective decision-making is enacted in skilled human play. Our interpretation points to a gap in the current capabilities of AI agents to perform cooperative tasks.
https://doi.org/10.1145/3544548.3581550