We study how the ratings people receive on online labor platforms are influenced by their performance, gender, their rater's gender, and displayed ratings from other raters. We conducted a deception study in which participants collaborated on a task with a pair of simulated workers, who varied in gender and performance level, and then rated their performance. When the performance of paired workers was similar, low-performing females were rated lower than their male counterparts. Where there was a clear performance difference between paired workers, low-performing females were preferred over a similarly-performing male peer. Furthermore, displaying an average rating from other raters made ratings more extreme, resulting in high performing workers receiving significantly higher ratings and low performers lower ratings compared to when average ratings were absent. This work contributes an empirical understanding of when biases in ratings manifest, and offers recommendations for how online work platforms can counter these biases.
https://doi.org/10.1145/3313831.3376860
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2020.acm.org/)