"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI

要旨

AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define, identify, and present empirical evidence on Data Cascades---compounding events causing negative, downstream effects from data issues---triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92\% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a first-class citizen of AI, resulting in safer and more robust systems for all.

受賞
Best Paper
著者
Nithya Sambasivan
Google Research, San Francisco, California, United States
Shivani Kapania
Google Research India, Bangalore, India
Hannah Highfill
Google Inc., Mountain View, California, United States
Diana Akrong
Google Research Accra, Accra, Ghana
Praveen Paritosh
Google, San Francisco, California, United States
Lora M. Aroyo
Google, New York, New York, United States
DOI

10.1145/3411764.3445518

論文URL

https://doi.org/10.1145/3411764.3445518

動画

会議: CHI 2021

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2021.acm.org/)

セッション: Tech for Specific Situations

[A] Paper Room 13, 2021-05-10 17:00:00~2021-05-10 19:00:00 / [B] Paper Room 13, 2021-05-11 01:00:00~2021-05-11 03:00:00 / [C] Paper Room 13, 2021-05-11 09:00:00~2021-05-11 11:00:00
Paper Room 13
13 件の発表
2021-05-10 17:00:00
2021-05-10 19:00:00
日本語まとめ
読み込み中…