Studies show that users do not reliably click more often on headlines classified as clickbait by automated classifiers. Is this because the linguistic criteria (e.g., use of lists or questions) emphasized by the classifiers are not psychologically relevant in attracting interest, or because their classifications are confounded by other unknown factors associated with assumptions of the classifiers? We address these possibilities with three studies—a quasi-experiment using headlines classified as clickbait by three machine-learning models (Study 1), a controlled experiment varying the headline of an identical news story to contain only one clickbait characteristic (Study 2), and a computational analysis of four classifiers using real-world sharing data (Study 3). Studies 1 and 2 revealed that clickbait did not generate more curiosity than non-clickbait. Study 3 revealed that while some headlines generate more engagement, the detectors agreed on a classification only 47% of the time, raising fundamental questions about their validity.
How to attribute responsibility for autonomous artificial intelligence (AI) systems' actions has been widely debated across the humanities and social science disciplines. This work presents two experiments (N=200 each) that measure people's perceptions of eight different notions of moral responsibility concerning AI and human agents in the context of bail decision-making. Using real-life adapted vignettes, our experiments show that AI agents are held causally responsible and blamed similarly to human agents for an identical task. However, there was a meaningful difference in how people perceived these agents' moral responsibility; human agents were ascribed to a higher degree of present-looking and forward-looking notions of responsibility than AI agents. We also found that people expect both AI and human decision-makers and advisors to justify their decisions regardless of their nature. We discuss policy and HCI implications of these findings, such as the need for explainable AI in high-stakes scenarios.
In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.
With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N = 3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were *less* able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.
In Human-AI collaborative settings that are inherently interactive, direction of communication plays a role in how users perceive their AI partners. In an AI-driven cooperative game with partially observable information, players (be it the AI or the human player) require their actions to be interpreted accurately by the other player to yield a successful outcome. In this paper, we investigate social perceptions of AI agents with various directions of communication in a cooperative game setting. We measure subjective social perceptions (rapport, intelligence, and likeability) of participants towards their partners when participants believe they are playing with an AI or with a human and the nature of the communication (responsiveness and leading roles). We ran a large scale study on Mechanical Turk (n=199) of this collaborative game and find significant differences in gameplay outcome and social perception across different AI agents, different directions of communication and when the agent is perceived to be an AI/Human. We find that the bias against the AI that has been demonstrated in prior studies varies with the direction of the communication and with the AI agent.
Artificial Intelligence (AI) education is an increasingly popular topic area for K-12 teachers. However, little research has investigated how AI curriculum and tools can be designed to be more accessible to all teachers and learners. In this study, we take a Value-Sensitive Design approach to understanding the role of teacher values in the design of AI curriculum and tools, and identifying opportunities to integrate AI into core curriculum to leverage learners' interests. We organized co-design workshops with 15 K-12 teachers, where teachers and researchers co-created lesson plans using AI tools and embedding AI concepts into various core subjects. We found that K-12 teachers need additional scaffolding in AI tools and curriculum to facilitate ethics and data discussions, and value supports for learner evaluation and engagement, peer-to-peer collaboration, and critical reflection. We present an exemplar lesson plan that shows entry points for teaching AI in non-computing subjects and reflect on co-designing with K-12 teachers in a remote setting.
In this paper, we report on perceptual experiments indicating that there are distinct and quantitatively measurable differences in the way we visually perceive genuine versus face-swapped videos.
Recent progress in deep learning has made face-swapping techniques a powerful tool for creative purposes, but also a means for unethical forgeries. Currently, it remains unclear why people are misled, and which indicators they use to recognize potential manipulations. Here, we conduct three perceptual experiments focusing on a wide range of aspects: the conspicuousness of artifacts, the viewing behavior using eye tracking, the recognition accuracy for different video lengths, and the assessment of emotions.
Our experiments show that responses differ distinctly when watching manipulated as opposed to original faces, from which we derive perceptual cues to recognize face swaps. By investigating physiologically measurable signals, our findings yield valuable insights that may also be useful for advanced algorithmic detection.
Domestic robots such as vacuum cleaners or lawnmowers are becoming popular consumer products in private homes, but while current HCI research on domestic robots has highlighted for example personalisation, long-term effects, or design guidelines, little attention has been paid to automation. To address this, we conducted a qualitative study with 24 participants in private households using interviews, contextual technology tours, and robot deployment. Through thematic analysis we identified three themes related to 1) work routines and automation, 2) domestic robot automation and the physical environment, as well as 3) interaction and breakdown intervention. We present an empirical understanding of how task automation using domestic robots can be implemented in the home. Lastly, we discuss our findings in relation to existing literature and highlight three opportunities for improved task automation using domestic robots for future research.
This work investigates the practices and challenges of voice user interface (VUI) designers. Existing VUI design guidelines recommend that designers strive for natural human-agent conversation. However, the literature leaves a critical gap regarding how designers pursue naturalness in VUIs and what their struggles are in doing so. Bridging this gap is necessary for identifying designers’ needs and supporting them. Our interviews with 20 VUI designers identified 12 ways that designers characterize and approach naturalness in VUIs. We categorized these characteristics into three groupings based on the types of conversational context that each characteristic contributes to: Social, Transactional, and Core. Our results contribute new findings on designers' challenges, such as a design dilemma in augmenting task-oriented VUIs with social conversations, difficulties in writing for spoken language, lack of proper tool support for imbuing synthesized voice with expressivity, and implications for developing design tools and guidelines.
Pair programming has a documented history of benefits, such as increased code quality, productivity, self-efficacy, knowledge transfer, and reduced gender gap. Research uncovered problems with pair programming related to scheduling, collocating, role imbalance, and power dynamics. We investigated the trade-offs of substituting a human with an agent to simultaneously provide benefits and alleviate obstacles in pair programming. We conducted gender-balanced studies with human-human pairs in a remote lab with 18 programmers and Wizard-of-Oz studies with 14 programmers, then analyzed results quantitatively and qualitatively. Our comparative analysis of the two studies showed no significant differences in productivity, code quality, and self-efficacy. Further, agents facilitated knowledge transfer; however, unlike humans, agents were unable to provide logical explanations or discussions. Human partners trusted and showed humility towards agents. Our results demonstrate that agents can act as effective pair programming partners and open the way towards new research on conversational agents for programming.
Perceptions of system competence and communicative ability, termed partner models, play a significant role in speech interface interaction. Yet we do not know what the core dimensions of this concept are. Taking a psycholexical approach, our paper is the first to identify the key dimensions that define partner models in speech agent interaction. Through a repertory grid study (N=21), a review of key subjective questionnaires, an expert review of resulting word pairs and an online study of 356 users of speech interfaces, we identify three key dimensions that make up a users’ partner model: 1) perceptions towards partner competence and dependability; 2) assessment of human-likeness; and 3) a system’s perceived cognitive flexibility. We discuss the implications for partner modelling as a concept, emphasising the importance of salience and the dynamic nature of these perceptions.
The uptake of artificial intelligence-based applications raises concerns about the fairness and transparency of AI behaviour. Consequently, the Computer Science community calls for the involvement of the general public in the design and evaluation of AI systems. Assessing the fairness of individual predictors is an essential step in the development of equitable algorithms. In this study, we evaluate the effect of two common visualisation techniques (text-based and scatterplot) and the display of the outcome information (i.e., ground-truth) on the perceived fairness of predictors. Our results from an online crowdsourcing study (N = 80) show that the chosen visualisation technique significantly alters people's fairness perception and that the presented scenario, as well as the participant's gender and past education, influence perceived fairness. Based on these results we draw recommendations for future work that seeks to involve non-experts in AI fairness evaluations.
Current personal informatics models consider reflection as an important stage in users' journeys with trackers. However, these models describe reflection from a meta perspective and it remains unclear what this stage entails. To design interactive technologies that support reflection, we need a more thorough understanding of how people reflect on their personal data in practice. To that end, we conducted semi-structured interviews with users of fitness trackers and an online survey to study practices in reflecting on fitness data. Our results show that users reported reflecting on data despite lacking reflection support from their tracking technology. Based on our results, we introduce the Technology-Mediated Reflection Model, which describes conditions and barriers for reflection on personal data. Our model consists of the temporal and conceptual cycles of reflection and helps designers identify the possible barriers a user might face when using a system for reflection.