Datasheets for Datasets help ML engineers notice and understand ethical issues in training data

要旨

The social computing community has demonstrated interest in the ethical issues sometimes produced by machine learning (ML) models, like violations of privacy, fairness, and accountability. This paper discovers what kinds of ethical considerations machine learning engineers recognize, how they build understanding, and what decisions they make when working with a real-world dataset. In particular, it illustrates ways in which Datasheets for Datasets, an accountability intervention designed to help engineers explore unfamiliar training data, scaffolds the process of issue discovery, understanding, and ethical decision-making. Participants were asked to review an intentionally ethically problematic dataset and asked to think aloud as they used it to solve a given ML problem. Out of 23 participants, 11 were given a Datasheet they could use while completing the task. Participants were ethically sensitive enough to identify concerns in the dataset; participants who had a Datasheet did open and refer to it; and those with Datasheets mentioned ethical issues during the think-aloud earlier and more often than than those without. The think-aloud protocol offered a grounded description of how participants recognized, understood, and made a decision about ethical problems in an unfamiliar dataset. The method used in this study can test other interventions that claim to encourage recognition, promote understanding, and support decision-making among technologists.

受賞
Honorable Mention
著者
Karen L. Boyd
University of Michigan, Ann Arbor, Michigan, United States
論文URL

https://doi.org/10.1145/3479582

動画

会議: CSCW2021

The 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing

セッション: Data Work and AI

Papers Room B
8 件の発表
2021-10-27 22:30:00
2021-10-28 00:00:00