Coping with AI: not agAIn!

https://doi.org/10.1145/3313831.3376257

Understanding how racial information impacts human decision making in online systems is critical in today's world. Prior work revealed that race information of criminal defendants, when presented as a text field, had no significant impact on users' judgements of recidivism. We replicated and extended this work to explore how and when race information influences users' judgements, with respect to the saliency of presentation. Our results showed that adding photos to the race labels had a significant impact on recidivism predictions for users who identified as female, but not for those who identified as male. The race of the defendant also impacted these results, with black defendants being less likely to be predicted to recidivate compared to white defendants. These results have strong implications for how system-designers choose to display race information, and cautions researchers to be aware of gender and race effects when using Amazon Mechanical Turk workers.

bias, recidivism

race

gender

crowd work

Mechanical Turk

legal

human-AI collaboration

University of Washington, Seattle, WA, USA

Microsoft Research, Redmond, WA, USA

Cornell University, Ithaca, NY, USA

University of Michigan, Ann Arbor, MI, USA

Microsoft Research, Redmond, WA, USA

10.1145/3313831.3376257

https://doi.org/10.1145/3313831.3376813

Algorithmic decision-making systems are increasingly used throughout the public and private sectors to make important decisions or assist humans in making these decisions with real social consequences. While there has been substantial research in recent years to build fair decision-making algorithms, there has been less research seeking to understand the factors that affect people's perceptions of fairness in these systems, which we argue is also important for their broader acceptance. In this research, we conduct an online experiment to better understand perceptions of fairness, focusing on three sets of factors: algorithm outcomes, algorithm development and deployment procedures, and individual differences. We find that people rate the algorithm as more fair when the algorithm predicts in their favor, even surpassing the negative effects of describing algorithms that are very biased against particular demographic groups. We find that this effect is moderated by several variables, including participants' education level, gender, and several aspects of the development procedure. Our findings suggest that systems that evaluate algorithmic fairness through users' feedback must consider the possibility of "outcome favorability" bias.

perceived fairness

algorithmic decision-making

algorithmoutcome

algorithm development

Carnegie Mellon University, Pittsburgh, PA, USA

Amazon, Seattle, WA, USA

Carnegie Mellon University, Pittsburgh, PA, USA

10.1145/3313831.3376813

https://doi.org/10.1145/3313831.3376219

Machine learning (ML) models are now routinely deployed in domains ranging from criminal justice to healthcare. With this newfound ubiquity, ML has moved beyond academia and grown into an engineering discipline. To that end, interpretability tools have been designed to help data scientists and machine learning practitioners better understand how ML models work. However, there has been little evaluation of the extent to which these tools achieve this goal. We study data scientists' use of two existing interpretability tools, the InterpretML implementation of GAMs and the SHAP Python package. We conduct a contextual inquiry (N=11) and a survey (N=197) of data scientists to observe how they use interpretability tools to uncover common issues that arise when building and evaluating ML models. Our results indicate that data scientists over-trust and misuse interpretability tools. Furthermore, few of our participants were able to accurately describe the visualizations output by these tools. We highlight qualitative themes for data scientists' mental models of interpretability tools. We conclude with implications for researchers and tool designers, and contextualize our findings in the social science literature.

Interpretability

Machine learning

User-centric evaluation

University of Michigan, Ann Arbor, MI, USA

Microsoft Research, Seattle, WA, USA

Microsoft Research, Redmond, WA, USA

Microsoft Research, New York City, NY, USA

Microsoft Research, New York, NY, USA

10.1145/3313831.3376219

https://doi.org/10.1145/3313831.3376316

As more and more forms of AI become prevalent, it becomes increasingly important to understand how people develop mental models of these systems. In this work we study people's mental models of AI in a cooperative word guessing game. We run think-aloud studies in which people play the game with an AI agent; through thematic analysis we identify features of the mental models developed by participants. In a large-scale study we have participants play the game with the AI agent online and use a post-game survey to probe their mental model. We find that those who win more often have better estimates of the AI agent's abilities. We present three components for modeling AI systems, propose that understanding the underlying technology is insufficient for developing appropriate conceptual models (analysis of behavior is also necessary), and suggest future work for studying the revision of mental models over time.

Artificial intelligence

mental models

conceptual models

games

word games

AI agents

think-aloud

Columbia University, New York City, NY, USA

IBM Research AI, Yorktown Heights, NY, USA

IBM Research AI, Cambridge, MA, USA

IBM Watson, Cambridge, MA, USA

IBM Research AI, Yorktown, NY, USA

10.1145/3313831.3376316