The spread of AI-embedded systems involved in human decision making makes studying human trust in these systems critical. However, empirically investigating trust is challenging. One reason is the lack of standard protocols to design trust experiments. In this paper, we present a survey of existing methods to empirically investigate trust in AI-assisted decision making and analyse the corpus along the constitutive elements of an experimental protocol. We find that the defintion fo trust is not commonly integrated in the experimental protocols, which can lead to findings that are overclaimed or are hard to interpret and compare across studies. Drawing from empirical practices in social and cognitive studies on human-human trust, we provide practical guidelines to improve the methodology of studying Human-AI trust in decision-making contexts. In addition, we bring forward research opportunities of two types: one focusing on further investigation regarding trust methodologies and the other on factors that impact Human-AI trust.
https://doi.org/10.1145/3476068
As the amount of information online continues to grow, a correspondingly important opportunity is for individuals to reuse knowledge which has been summarized by others rather than starting from scratch. However, appropriate reuse requires judging the relevance, trustworthiness, and thoroughness of others' knowledge in relation to an individual's goals and context. In this work, we explore augmenting judgements of the appropriateness of reusing knowledge in the domain of programming, specifically of reusing artifacts that result from other developers' searching and decision making. Through an analysis of prior research on sensemaking and trust, along with new interviews with developers, we synthesized a framework for reuse judgements. The interviews also validated that developers express a desire for help with judging whether to reuse an existing decision. From this framework, we developed a set of techniques for capturing the initial decision maker's behavior and visualizing signals calculated based on the behavior, to facilitate subsequent consumers' reuse decisions, instantiated in a prototype system called Strata. Results of a user study suggest that the system significantly improves the accuracy, depth, and speed of reusing decisions. These results have implications for systems involving user-generated content in which other users need to evaluate the relevance and trustworthiness of that content.
https://doi.org/10.1145/3449240
In this paper, we report results from fieldwork in the context of municipalities and governmental institutions looking to implement algorithmic decision-making in public service provision. We empirically investigate bureaucratic decision-making practices in the context of governmental job placement, a core public service in many countries, from the perspective of caseworkers. Acting as participants in a large cross-disciplinary research project between 2019-2020 (ongoing), we set up a participatory workshop with caseworkers. This was followed up by in situ interviews that allowed the caseworkers to think-aloud while guiding us as we talked through the decision-making process in governmental job placement. The paper’s contribution is a conceptualization of the characteristics of bureaucratic decision-making in the context of human-AI collaboration: 1) processual decisions that move forward the caseworker’s understanding of the individual case; 2) formal decisions whereby caseworkers close a bureaucratic process or individual’s application; 3) balancing decisions in which the caseworker weighs the potential consequences when a decision is uncertain or questionable. The application of human-AI collaboration in job placement, we argue in this paper, must take the kind of bureaucratic decision (processual, formal, or balancing) into consideration, as we consider from a CSCW perspective how to integrate AI into the already-complicated human workflow of bureaucratic decision- making.
https://doi.org/10.1145/3449114
Algorithms have permeated throughout civil government and society, where they are being used to make high-stakes decisions about human lives. In this paper, we first develop a cohesive framework of algorithmic decision-making adapted for the public sector (ADMAPS) that reflects the complex socio-technical interactions between human discretion, bureaucratic processes, and algorithmic decision-making by synthesizing disparate bodies of work in the fields of Human-Computer Interaction (HCI), Science and Technology Studies (STS), and Public Administration (PA). We then applied the ADMAPS framework to conduct a qualitative analysis of an in-depth, eight-month ethnographic case study of the algorithms in daily use within a child-welfare agency that serves approximately 900 families and 1300 children in the mid-western United States. Overall, we found there is a need to focus on strength-based algorithmic outcomes centered in social ecological frameworks. In addition, algorithmic systems need to support existing bureaucratic processes and augment human discretion, rather than replace it. Finally, collective buy-in in algorithmic systems requires trust in the target outcomes at both the practitioner and bureaucratic levels. As a result of our study, we propose guidelines for the design of high-stakes algorithmic decision-making tools in the child-welfare system, and more generally, in the public sector. We empirically validate the theoretically derived ADMAPS framework to demonstrate how it can be useful for systematically making pragmatic decisions about the design of algorithms for the public sector.
https://doi.org/10.1145/3476089
As the use of algorithmic systems in high-stakes decision-making increases, the ability to contest algorithmic decisions is being recognised as an important safeguard for individuals. Yet, there is little guidance on what `contestability'--the ability to contest decisions--in relation to algorithmic decision-making requires. Recent research presents different conceptualisations of contestability in algorithmic decision-making. We contribute to this growing body of work by describing and analysing the perspectives of people and organisations who made submissions in response to Australia's proposed `AI Ethics Framework', the first framework of its kind to include `contestability' as a core ethical principle. Our findings reveal that while the nature of contestability is disputed, it is seen as a way to protect individuals, and it resembles contestability in relation to human decision-making. We analyse and consider the implications of these findings.
https://doi.org/10.1145/3449180
Although AI holds promise for improving human decision making in societally critical domains, it remains an open question how human-AI teams can reliably outperform AI alone and human alone in challenging prediction tasks (also known as complementary performance). We explore two directions to understand the gaps in achieving complementary performance. First, we argue that the typical experimental setup limits the potential of human-AI teams. To account for lower AI performance out-of-distribution than in-distribution because of distribution shift, we design experiments with different distribution types and investigate human performance for both in-distribution and out-of-distribution examples. Second, we develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance. Using virtual pilot studies and large-scale randomized experiments across three tasks, we demonstrate a clear difference between in-distribution and out-of-distribution, and observe mixed results for interactive explanations: while interactive explanations improve human perception of AI assistance’s usefulness, they may reinforce human biases and lead to limited performance improvement. Overall, our work points out critical challenges and future directions towards enhancing human performance with AI assistance.
https://doi.org/10.1145/3479552
Governments are increasingly turning to algorithmic risk assessments when making important decisions (such as whether to release criminal defendants before trial). Policymakers assert that providing public servants with algorithms will improve human risk predictions and thereby lead to better (e.g., fairer) decisions. Yet because many policy decisions require balancing risk-reduction with competing goals, improving the accuracy of predictions may not necessarily improve the quality of decisions. Through an experiment with 2,140 lay participants simulating two high-stakes government contexts, we interrogate the assumption that improving human prediction accuracy with risk assessments will improve human decisions. We provide the first direct evidence that risk assessments can systematically alter how people factor risk into their decisions. These shifts counteract the potential benefits of improved prediction accuracy. In the pretrial setting of our experiment, the risk assessment made participants more sensitive to increases in perceived risk when making decisions; this shift increased the racial disparity in pretrial detention by 1.9%. In the government home improvement loans setting of our experiment, the risk assessment made participants more risk-averse when making decisions; this shift reduced government aid by 8.3%. These results demonstrate the potential limits and harms of efforts to improve public policy by incorporating predictive algorithms into multifaceted policy decisions. If these observed behaviors occurred in practice, presenting algorithms to public servants would generate unexpected and unjust shifts in public policy without being subject to democratic deliberation or oversight.
https://doi.org/10.1145/3479562
The increased use of algorithms to support decision making raises questions about whether people prefer algorithmic or human input when making decisions. Two streams of research on algorithm aversion and algorithm appreciation have yielded contradicting results. Our work attempts to reconcile this conflict by focusing on the framings of humans and algorithms as a mechanism. In three decision making experiments, we created an algorithm appreciation result (Experiment 1) as well as an algorithm aversion result (Experiment 2) by manipulating only the description of the human agent and the algorithmic agent, and we demonstrated how different choices of framings can lead to inconsistent results in previous studies (Experiment 3). We also showed that these results were mediated by the agent's perceived competence, i.e., expert power. The results provide insights into the divergence of the algorithm aversion and algorithm appreciation literature. We hope to shift the attention from these two contradicting phenomena to how we can better design the framing of algorithms. We also call the attention of the community to the theory of power sources, as it can be a useful framework for designing algorithmic decision support systems.
https://doi.org/10.1145/3479864