Data Work and AI

会議の名前
CSCW2021
Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling
要旨

As image-based classifiers' application areas diversify, capturing human annotators' subjective responses in the data acquisition process is becoming crucial. In such scenarios, however, eliciting a response from human labelers in a reliable and cost-efficient manner has been a significant challenge. To bridge this gap, we seek to understand how applying different sequencing strategies in batch image labeling can impact human annotators' labeling performances. In particular, we developed three different sequencing strategies: (1) uncertainty-based labeling (UL), a sequencing strategy that prioritizes images that a classifier predicts with the highest uncertainty, and (2) certainty-based labeling (CL), a reverse strategy of UL that presents images with the highest prediction probability first, and (3) random, a baseline approach that randomly chooses a set of images in batch forming. Although UL and CL are the strategies that select images to-be-surfaced based on a classifier's point-of-view, we hypothesized that human annotators' perception and labeling performance may vary depending on the different sequencing strategies. In our study, we identified that participants were able to recognize a different level of perceived cognitive load across three conditions (CL the easiest and UL the hardest), while found a trade-off between labeling reliability (CL and UL more reliable than random) and task efficiency (UL the most efficient while CL the least efficient). Based on the results, we discuss the implications of design for data scientists who may consider applying different sequencing strategies in collecting image labels at scale for subjective tasks. Then we suggest possible future research areas.

著者
Sungsoo Ray Hong
George Mason University, Fairfax, Virginia, United States
Chaeyeon Chung
KAIST, Daejeon, Korea, Republic of
Jung Soo Lee
KAIST, Daejeon, Korea, Republic of
Kyungmin Park
Shinhan Bank, Seoul, Korea, Republic of
Junsoo Lee
KAIST, Daejeon, Korea, Republic of
Minjae Kim
NCSOFT, Seongnam-si, Korea, Republic of
Mookyung Song
NCSOFT, SeongNam, Korea, Republic of
Yeonwoo Kim
NCSOFT, SEOUL, Korea, Republic of
Jaegul Choo
KAIST, Daejeon, Korea, Republic of
論文URL

https://doi.org/10.1145/3476037

動画
AI-Assisted Human Labeling: Batching for Efficiency without Overreliance
要旨

Human labeling of training data is often a time-consuming, expensive part of machine learning. In this paper, we study "batch labeling", an AI-assisted UX paradigm, that aids data labelers by allowing a single labeling action to apply to multiple records. We ran a large scale study on Mechanical Turk with 156 participants to investigate labeler-AI-batching system interaction. We investigate the efficacy of the system when compared to a single-item labeling interface (i.e., labeling one record at-a-time), and evaluate the impact of batch labeling on accuracy, and time. We further investigate the impact of AI algorithm quality and its effects on the labelers' overreliance, as well as potential mechanisms for mitigating it. Our work offers implications for the design of batch labeling systems and for work practices focusing on labeler-AI-batching system interaction.

著者
Zahra Ashktorab
IBM Research, Yorktown Heights, New York, United States
Casey Dugan
IBM Research, Cambridge, Massachusetts, United States
Aabhas Sharma
IBM Research, Cambridge, Massachusetts, United States
Evelyn Duesterwald
IBM Research, Yorktown Heights, New York, United States
Michael Muller
IBM Research, Cambridge, Massachusetts, United States
Michael Desmond
IBM Research, Yorktown Heights, New York, United States
Christine Wolf
IBM Research - Almaden, San Jose, California, United States
Josh Andres
IBM Research Australia, Melbourne, Victoria, Australia
Narendra Nath Joshi
IBM, Cambridge, Massachusetts, United States
Kristina Brimijoin
Mrs., Hastings on Hudson, New York, United States
Werner Geyer
IBM Research, Cambridge, Massachusetts, United States
Qian Pan
IBM Research, Cambridge, Massachusetts, United States
Darrell Reimer
IBM Research AI, Yorktown Heights, New York, United States
Michelle Brachman
IBM Research, Cambridge, Massachusetts, United States
論文URL

https://doi.org/10.1145/3449163

動画
"We Would Never Write That Down": Classifications of Unemployed and Data Challenges for AI
要旨

This paper draws attention to new complexities of deploying AI systems to sensitive contexts, such as welfare allocation. AI is increasingly used in public administration with the promise of improving decision-making. To succeed, it needs all the criteria used as part of decisions, formal and informal. In this paper, we empirically explore the informal classifications used by caseworkers to make unemployed welfare seekers ‘fit’ into the formal categories in a Danish job centre. Our findings show that the classifications used by caseworkers are documentable, and hence traceable to AI. To the caseworkers, however, classifications are at odds with the stable explanations assumed by any recording system as they involve negotiated and situated judgments of people’s character. Thus, for moral reasons, caseworkers find them ill-suited for formal representation and would never write them down. As a result, AI is denuded of the real-world (and real work) nature of decision-making. This is imperative to CSCW as it is not only about whether AI can ‘do’ decision-making, as previous research suggests. In this paper, we show that problems may also be caused by the unwillingness of people to provide the data these systems need. It is the purpose of this paper to present the empirical results of this research, followed by a discussion of implications for AI-supported practice and future research.

受賞
Honorable Mention
著者
Anette C. M.. Petersen
IT University of Copenhagen, Copenhagen, Denmark, Denmark
Lars Rune Christensen
IT University of Copenhagen, Copenhagen, Denmark
Richard Harper
Lancaster University, Lancaster, United Kingdom
Thomas Hildebrandt
University of Copenhagen, Copenhagen, Denmark
論文URL

https://doi.org/10.1145/3449176

Can “Conscious Data Contribution” Help Users to Exert “Data Leverage” Against Technology Companies?
要旨

Tech users currently have limited ability to act on concerns around the negative societal impacts of large tech companies. However, recent work suggests users can exert leverage using their role as contributors of valuable data, for instance by withholding their data contributions to intelligent technologies. We propose and evaluate a new means to exert leverage against tech companies: “conscious data contribution” (CDC). Users who participate in CDC exert leverage against an offending tech company by contributing data to technologies operated by a competitor of that offending company. Using simulations, we find that CDC could be highly effective at reducing the gap in performance of intelligent technologies between an incumbent and their competitors. In some cases, just 20% of users contributing their data to a small competitor could help that competitor get 80% of the way towards best-case performance. We discuss the implications of CDC for policymakers, tech designers, and researchers.

著者
Nicholas Vincent
Northwestern University, Evanston, Illinois, United States
Brent Hecht
Northwestern University, Evanston, Illinois, United States
論文URL

https://doi.org/10.1145/3449177

動画
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
要旨

Data is a crucial component of machine learning; the field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings—and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision’s propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation—how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collect a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.

受賞
Best Paper
著者
Morgan Klaus. Scheuerman
University of Colorado Boulder, Boulder, Colorado, United States
Alex Hanna
Google, Sunnyvale, California, United States
Emily Denton
Google, New York, New York, United States
論文URL

https://doi.org/10.1145/3476058

動画
Datasheets for Datasets help ML engineers notice and understand ethical issues in training data
要旨

The social computing community has demonstrated interest in the ethical issues sometimes produced by machine learning (ML) models, like violations of privacy, fairness, and accountability. This paper discovers what kinds of ethical considerations machine learning engineers recognize, how they build understanding, and what decisions they make when working with a real-world dataset. In particular, it illustrates ways in which Datasheets for Datasets, an accountability intervention designed to help engineers explore unfamiliar training data, scaffolds the process of issue discovery, understanding, and ethical decision-making. Participants were asked to review an intentionally ethically problematic dataset and asked to think aloud as they used it to solve a given ML problem. Out of 23 participants, 11 were given a Datasheet they could use while completing the task. Participants were ethically sensitive enough to identify concerns in the dataset; participants who had a Datasheet did open and refer to it; and those with Datasheets mentioned ethical issues during the think-aloud earlier and more often than than those without. The think-aloud protocol offered a grounded description of how participants recognized, understood, and made a decision about ethical problems in an unfamiliar dataset. The method used in this study can test other interventions that claim to encourage recognition, promote understanding, and support decision-making among technologists.

受賞
Honorable Mention
著者
Karen L. Boyd
University of Michigan, Ann Arbor, Michigan, United States
論文URL

https://doi.org/10.1145/3479582

動画
Automatically Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing Behaviors
要旨

Wikipedia articles aim to be definitive sources of encyclopedic content. Yet, only 0.6% of Wikipedia articles have high quality according to its quality scale due to insufficient number of Wikipedia editors and enormous number of articles. Supervised Machine Learning (ML) quality improvement approaches that can automatically identify and fix content issues rely on manual labels of individual Wikipedia sentence quality. However, current labeling approaches are tedious and produce noisy labels. Here, we propose an automated labeling approach that identifies the semantic category (e.g., adding citations, clarifications) of historic Wikipedia edits and uses the modified sentences prior to the edit as examples that require that semantic improvement. Highest-rated article sentences are examples that no longer need semantic improvements. We show that training existing sentence quality classification algorithms on our labels improves their performance compared to training them on existing labels. Our work shows that editing behaviors of Wikipedia editors provide better labels than labels generated by crowdworkers who lack the context to make judgments that the editors would agree with.

著者
Sumit Asthana
University of Michigan, Ann Abror, Michigan, United States
Sabrina Tobar Thommel
University of Michigan, Ann Arbor, Michigan, United States
Aaron Lee. Halfaker
Microsoft, Redmond, Washington, United States
Nikola Banovic
University of Michigan, Ann Arbor, Michigan, United States
論文URL

https://doi.org/10.1145/3479503

動画
Nowcasting Gentrification Using Airbnb Data
要旨

There is a rumbling debate over the impact of gentrification: presumed gentrifiers have been the target of protests and attacks in some cities, while they have been welcome as generators of new jobs and taxes in others. Census data fails to measure neighborhood change in real-time since it is usually updated every ten years. This work shows that Airbnb data can be used to quantify and track neighborhood changes. Specifically, we consider both structured data (e.g., number of listings, number of reviews, listing information) and unstructured data (e.g., user-generated reviews processed with natural language processing and machine learning algorithms) for three major cities, New York City (US), Los Angeles (US), and Greater London (UK). We find that Airbnb data (especially its unstructured part) appears to nowcast neighborhood gentrification, measured as changes in housing affordability and demographics. Overall, our results suggest that user-generated data from online platforms can be used to create socioeconomic indices to complement traditional measures that are less granular, not in real-time, and more costly to obtain.

著者
Shomik Jain
University of Southern California, Los Angeles, California, United States
Davide Proserpio
University of Southern California, Los Angeles, California, United States
Giovanni Quattrone
Middlesex University, London, United Kingdom
Daniele Quercia
Nokia Bell Labs, Cambridge, United Kingdom
論文URL

https://doi.org/10.1145/3449112