Non-experts increasingly engage in user-driven algorithm auditing, interacting directly with AI systems to probe, document, and reflect on biased behavior. Yet, auditing remains challenging due to model opacity and limited support for navigating and interpreting outputs. This paper explores the design and evaluation of interfaces grounded in the sensemaking framework to support non-experts in auditing gender bias in image captioning. In a between-subjects study, 60 participants audited an image captioning model using one of three interface conditions: a Baseline interface, a Masking Tool for image manipulation, or a Filtering Tool for organizing captions. Our findings show that interface design shaped what participants noticed, how they interpreted model behavior, and supported their hypotheses. The Image Masking Tool enabled fine-grained testing of visual cues and context, while the Text Filtering Tool revealed broader asymmetries in gendered language. We argue that incorporating sensemaking into auditing practices can advance accountability and transparency in machine learning systems.
ACM CHI Conference on Human Factors in Computing Systems