The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality

要旨

Machine learning classifiers for human-facing tasks such as comment toxicity and misinformation often score highly on metrics such as ROC AUC but are received poorly in practice. Why this gap? Today, metrics such as ROC AUC, precision, and recall are used to measure technical performance; however, human-computer interaction observes that evaluation of human-facing systems should account for people's reactions to the system. In this paper, we introduce a transformation that more closely aligns machine learning classification metrics with the values and methods of user-facing performance measures. The disagreement deconvolution takes in any multi-annotator (e.g., crowdsourced) dataset, disentangles stable opinions from noise by estimating intra-annotator consistency, and compares each test set prediction to the individual stable opinions from each annotator. Applying the disagreement deconvolution to existing social computing datasets, we find that current metrics dramatically overstate the performance of many human-facing machine learning tasks: for example, performance on a comment toxicity task is corrected from .95 to .73 ROC AUC.

著者
Mitchell L. Gordon
Stanford University, Stanford, California, United States
Kaitlyn Zhou
Stanford University, Stanford, California, United States
Kayur Patel
Apple Inc, Seattle, Washington, United States
Tatsunori Hashimoto
Stanford University, Stanford, California, United States
Michael S.. Bernstein
Stanford University, Stanford, California, United States
DOI

10.1145/3411764.3445423

論文URL

https://doi.org/10.1145/3411764.3445423

動画

会議: CHI 2021

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2021.acm.org/)

セッション: Computational Human-AI Conversation

[A] Paper Room 02, 2021-05-11 17:00:00~2021-05-11 19:00:00 / [B] Paper Room 02, 2021-05-12 01:00:00~2021-05-12 03:00:00 / [C] Paper Room 02, 2021-05-12 09:00:00~2021-05-12 11:00:00
Paper Room 02
14 件の発表
2021-05-11 17:00:00
2021-05-11 19:00:00
日本語まとめ
読み込み中…