Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes noise for the classifier and might affect the user's perception of the classifier's performance. In our research, we investigated whether the classification difficulty of a data point influences how strongly a prediction mistake reduces the "perceived accuracy". In an experimental online study, 225 participants interacted with three fictive classifiers with equal accuracy (73%). The classifiers made prediction mistakes on three different types of data points (easy, difficult, impossible). After the interaction, participants judged the classifier's accuracy. We found that not all prediction mistakes reduced the perceived accuracy equally. Furthermore, the perceived accuracy differed significantly from the calculated accuracy. To conclude, accuracy and related measures seem unsuitable to represent how users perceive the performance of classifiers.
https://dl.acm.org/doi/abs/10.1145/3491102.3501915
Machine learning models need to provide contrastive explanations, since people often seek to understand why a puzzling prediction occurred instead of some expected outcome. Current contrastive explanations are rudimentary comparisons between examples or raw features, which remain difficult to interpret, since they lack semantic meaning. We argue that explanations must be more relatable to other concepts, hypotheticals, and associations. Inspired by the perceptual process from cognitive psychology, we propose the XAI Perceptual Processing framework and RexNet model for relatable explainable AI with Contrastive Saliency, Counterfactual Synthetic, and Contrastive Cues explanations. We investigated the application of vocal emotion recognition, and implemented a modular multi-task deep neural network to predict and explain emotions from speech. From think-aloud and controlled studies, we found that {\color{blue}counterfactual explanations were useful and further enhanced with semantic cues, but not saliency explanations}. This work provides insights into providing and evaluating relatable contrastive explainable AI for perception applications.
https://dl.acm.org/doi/abs/10.1145/3491102.3501826
Model explanations such as saliency maps can improve user trust in AI by highlighting important features for a prediction. However, these become distorted and misleading when explaining predictions of images that are subject to systematic error (bias) by perturbations and corruptions. Furthermore, the distortions persist despite model fine-tuning on images biased by different factors (blur, color temperature, day/night). We present Debiased-CAM to recover explanation faithfulness across various bias types and levels by training a multi-input, multi-task model with auxiliary tasks for explanation and bias level predictions. In simulation studies, the approach not only enhanced prediction accuracy, but also generated highly faithful explanations about these predictions as if the images were unbiased. In user studies, debiased explanations im- proved user task performance, perceived truthfulness and perceived helpfulness. Debiased training can provide a versatile platform for robust performance and explanation faithfulness for a wide range of applications with data biases.
https://dl.acm.org/doi/abs/10.1145/3491102.3517522
Feedback in creativity support tools can help crowdworkers to improve their ideations. However, current feedback methods require human assessment from facilitators or peers. This is not scalable to large crowds. We propose Interpretable Directed Diversity to automatically predict ideation quality and diversity scores, and provide AI explanations — Attribution, Contrastive Attribution, and Counterfactual Suggestions — to feedback on why ideations were scored (low), and how to get higher scores. These explanations provide multi-faceted feedback as users iteratively improve their ideations. We conducted formative and controlled user studies to understand the usage and usefulness of explanations to improve ideation diversity and quality. Users appreciated that explanation feedback helped focus their efforts and provided directions for improvement. This resulted in explanations improving diversity compared to no feedback or feedback with scores only. Hence, our approach opens opportunities for explainable AI towards scalable and rich feedback for iterative crowd ideation and creativity support tools.
https://dl.acm.org/doi/abs/10.1145/3491102.3517551
Deep learning models for image classification suffer from dangerous issues often discovered after deployment. The process of identifying bugs that cause these issues remains limited and understudied. Especially, explainability methods are often presented as obvious tools for bug identification. Yet, the current practice lacks an understanding of what kind of explanations can best support the different steps of the bug identification process, and how practitioners could interact with those explanations. Through a formative study and an iterative co-creation process, we build an interactive design probe providing various potentially relevant explainability functionalities, integrated into interfaces that allow for flexible workflows. Using the probe, we perform 18 user-studies with a diverse set of machine learning practitioners. Two-thirds of the practitioners engage in successful bug identification. They use multiple types of explanations, e.g. visual and textual ones, through non-standardized sequences of interactions including queries and exploration. Our results highlight the need for interactive, guiding, interfaces with diverse explanations, shedding light on future research directions.
https://dl.acm.org/doi/abs/10.1145/3491102.3517474