Politics of Datasets

会議の名前
CHI 2024
The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology
要旨

Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction.

受賞
Honorable Mention
著者
Emma Harvey
Cornell University, Ithaca, New York, United States
Hauke Sandhaus
Cornell University, Ithaca, New York, United States
Abigail Jacobs
University of Michigan, Ann Arbor, Michigan, United States
Emanuel Moss
Intel Labs, Hillsboro, Oregon, United States
Mona Sloane
University of Virginia, Charlottesville, Virginia, United States
論文URL

https://doi.org/10.1145/3613904.3642004

動画
Aligning Data with the Goals of an Organization and Its Workers: Designing Data Labeling for Social Service Case Notes
要旨

The challenges of data collection in nonprofits for performance and funding reports are well-established in HCI research. Few studies, however, delve into improving the data collection process. Our study proposes ideas to improve data collection by exploring challenges that social workers experience when labeling their case notes. Through collaboration with an organization that provides intensive case management to those experiencing homelessness in the U.S., we conducted interviews with caseworkers and held design sessions where caseworkers, managers, and program analysts examined storyboarded ideas to improve data labeling. Our findings suggest several design ideas on how data labeling practices can be improved: Aligning labeling with caseworker goals, enabling shared control on data label design for a comprehensive portrayal of caseworker contributions, improving the synthesis of qualitative and quantitative data, and making labeling user-friendly. We contribute design implications for data labeling to better support multiple stakeholder goals in social service contexts.

著者
Apoorva Gondimalla
University of Texas at Austin, Austin, Texas, United States
Varshinee Sreekanth
The University of Texas at Austin, Austin, Texas, United States
Govind Joshi
University of Texas at Austin, Austin, Texas, United States
Whitney Nelson
University of Texas at Austin, Austin, Texas, United States
Eunsol Choi
The University of Texas at Austin, Austin, Texas, United States
Stephen C. Slota
University of Texas at Austin, Austin, Texas, United States
Sherri Greenberg
University of Texas at Austin, Austin, Texas, United States
Kenneth R.. Fleischmann
The University of Texas at Austin, Austin, Texas, United States
Min Kyung Lee
University of Texas at Austin, Austin, Texas, United States
論文URL

https://doi.org/10.1145/3613904.3642014

動画
The ``Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases
要旨

While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities who have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.

著者
Dipto Das
University of Colorado Boulder, Boulder, Colorado, United States
Shion Guha
University of Toronto, Toronto, Ontario, Canada
Jed R.. Brubaker
University of Colorado Boulder, Boulder, Colorado, United States
Bryan Semaan
University of Colorado Boulder, Boulder, Colorado, United States
論文URL

https://doi.org/10.1145/3613904.3642669

動画
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
要旨

Data analysts have long sought to turn unstructured text data into meaningful concepts. Though common, topic modeling and clustering focus on lower-level keywords and require significant interpretative work. We introduce concept induction, a computational process that instead produces high-level concepts, defined by explicit inclusion criteria, from unstructured text. For a dataset of toxic online comments, where a state-of-the-art BERTopic model outputs “women, power, female,” concept induction produces high-level concepts such as “Criticism of traditional gender roles” and “Dismissal of women's concerns.” We present LLooM, a concept induction algorithm that leverages large language models to iteratively synthesize sampled text and propose human-interpretable concepts of increasing generality. We then instantiate LLooM in a mixed-initiative text analysis tool, enabling analysts to shift their attention from interpreting topics to engaging in theory-driven analysis. Through technical evaluations and four analysis scenarios ranging from literature review to content moderation, we find that LLooM’s concepts improve upon the prior art of topic models in terms of quality and data coverage. In expert case studies, LLooM helped researchers to uncover new insights even from familiar datasets, for example by suggesting a previously unnoticed concept of attacks on out-party stances in a political social media dataset.

著者
Michelle S.. Lam
Stanford University, Stanford, California, United States
Janice Teoh
Stanford University, Stanford, California, United States
James A.. Landay
Stanford University, Stanford, California, United States
Jeffrey Heer
University of Washington, Seattle, Washington, United States
Michael S.. Bernstein
Stanford University, Stanford, California, United States
論文URL

https://doi.org/10.1145/3613904.3642830

動画
Situating Datasets: Making Public Eviction Data Actionable for Housing Justice
要旨

Activists, governments, and academics regularly advocate for more open data. But how is data made open, and for whom is it made useful and usable? In this paper, we investigate and describe the work of making eviction data open to tenant organizers. We do this through an ethnographic description of ongoing work with a local housing activist organization. This work combines observation, direct participation in data work, and creating media artifacts, specifically digital maps. Our interpretation is grounded in D’Ignazio and Klein’s Data Feminism, emphasizing standpoint theory. Through our analysis and discussion, we highlight how shifting positionalities from data intermediaries to data accomplices affects the design of data sets and maps. We provide HCI scholars with three design implications when situating data for grassroots organizers: becoming a domain beginner, striving for data actionability, and evaluating our design artifacts by the social relations they sustain rather than just their technical efficacy.

著者
Anh-Ton Tran
Georgia Institute of Technology, Atlanta, Georgia, United States
Grace Guo
Georgia Institute of Technology, Atlanta, Georgia, United States
Jordan Taylor
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Katsuki Andrew. Chan
Georgia Institute of Technology, Atlanta, Georgia, United States
Elora Lee. Raymond
Georgia Institute of Technology, Atlanta, Georgia, United States
Carl DiSalvo
Georgia Institute of Technology, Atlanta, Georgia, United States
論文URL

https://doi.org/10.1145/3613904.3642452

動画