Gender microaggressions are subtle yet persistent forms of discrimination in workplace interactions. While LLMs can detect them in written texts, it remains poorly understood how their interpretations align or diverge from human perspectives and experiences. We present a mixed-method study comparing how LLMs and humans differing in gender identity and lived experience, interpret gender microaggressions in the workplace. Using short dialogues adapted from real-world accounts, we asked 141 participants to rate the likelihood that a scenario contains a microaggression and provide a rationale for their answers. The same tasks were completed by 7 different LLM models. Our analysis reveals significant differences in how humans and LLMs interpret microaggressions, captured in both ratings and rationales, and more interestingly, the effect of gender and lived experience on human interpretations. These findings highlight the need for systems detecting microaggressions to embrace interpretive plurality, and support reflection and awareness while accounting for ambiguity.
ACM CHI Conference on Human Factors in Computing Systems