Algorithmic timeline curation is now an integral part of Twitter's platform, affecting information exposure for more than 150 million daily active users. Despite its large-scale and high-stakes impact, especially during a public health emergency such as the COVID-19 pandemic, the exact effects of Twitter's curation algorithm generally remain unknown. In this work, we present a sock-puppet audit that aims to characterize the effects of algorithmic curation on source diversity and content diversity in Twitter timelines. We created eight sock puppet accounts to emulate representative real-world users, selected through a large-scale network analysis. Then, for one month during early 2020, we collected the puppets' timelines twice per day. Broadly, our results show that algorithmic curation increases source diversity in terms of both Twitter accounts and external domains, even though it drastically decreases the number of external links in the timeline. In terms of content diversity, algorithmic curation had a mixed effect, slightly amplifying a cluster of politically-focused tweets while squelching a cluster of tweets focused on COVID-19 fatalities. Finally, we present some evidence that the timeline algorithm may exacerbate partisan differences in exposure to different sources and content. The paper concludes by discussing broader implications in the context of algorithmic gatekeeping.
https://doi.org/10.1145/3449152
The rise of geotargeted online advertising has disrupted the business model of local journalism, but it remains ambiguous whether online advertising platforms can effectively reach local audiences. To address this ambiguity, we present a focused study auditing the positional accuracy of geotargeted display advertisements on Google. We measure the frequency and severity of geotargeting errors by targeting display ads to random ZIP codes across the United States, collecting self-reported location information from users who click on the advertisement. We find evidence that geotargeting errors are common, but minor in terms of advertising goals. While 41% of respondents lived outside the target ZIP code, only 11% lived outside the target county, and only 2% lived outside the target state. We also present details regarding a high volume of suspicious clicks in our data, which made the cost per sample extremely expensive. The paper concludes by discussing implications for advertisers and the business of local journalism.
https://doi.org/10.1145/3449166
Smart speakers are becoming increasingly ubiquitous in society and are now used for satisfying a variety of information needs, from asking about the weather or traffic to accessing the latest breaking news information. Their growing use for news and information consumption presents new questions related to the quality, source diversity, and comprehensiveness of the news-related information they convey. These questions have significant implications for voice assistant technologies acting as algorithmic information intermediaries, but systematic information quality audits have not yet been undertaken. To address this gap, we develop a methodological approach for evaluating information quality in voice assistants for news-related queries. We demonstrate the approach on the Amazon Alexa voice assistant, first characterising Alexa's performance in terms of response relevance, accuracy, and timeliness, and then further elaborating analyses of information quality based on query phrasing, news category, and information provenance. We discuss the implications of our findings for the design of future smart speaker devices and for the consumption of news information via such algorithmic intermediaries more broadly.
https://doi.org/10.1145/3449157
A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users’ knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.
https://doi.org/10.1145/3479577
Large and ever-evolving technology companies continue to invest more time and resources to incorporate responsible Artificial Intelligence (AI) into production-ready systems to increase algorithmic accountability. This paper examines and seeks to offer a framework for analyzing how organizational culture and structure impact the effectiveness of responsible AI initiatives in practice. We present the results of semi-structured qualitative interviews with practitioners working in industry, investigating common challenges, ethical tensions, and effective enablers for responsible AI initiatives. Focusing on major companies developing or utilizing AI, we have mapped what organizational structures currently support or hinder responsible AI initiatives, what aspirational future processes and structures would best enable effective initiatives, and what key elements comprise the transition from current work practices to the aspirational future.
https://doi.org/10.1145/3449081
Algorithmically-mediated content is both a product and creator of dominant social narratives, and it has the potential to impact users' beliefs and behaviors. We present two studies on the content and impact of gender and racial representation in image search results for common occupations. In Study 1, we compare 2020 workforce gender and racial composition to that reflected in image search. We find evidence of underrepresentation on both dimensions: women are underrepresented in search at a rate of 42% women for a field with 50% women (comparable to 2015 levels of underrepresentation); people of color are underrepresented with 16% in search compared to an occupation with 22% people of color (proportional to the U.S. workforce). We also compare our gender representation data with that collected in 2015 by Kay et al., finding little improvement in the last half-decade. In Study 2, we study people's impressions of occupations and sense of belonging in a given field when shown search results with different proportions of women and people of color. We find that both types of representation as well as people's own racial and gender identities impact their experience of image search results, and conclude by emphasizing the need for designers and auditors of algorithms to consider the disparate impacts of algorithmic content on users of marginalized identities.
https://doi.org/10.1145/3449100
While algorithm audits are growing rapidly in importance and commonality, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to fill the gap, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search, vision, advertising, etc.), organization (e.g. Google, Facebook, Amazon, etc.), and audit method (e.g. sock puppet, direct scrape, crowdsourcing, etc.). Based on the review, previous audit studies have exposed powerful algorithms exhibiting problematic behavior, such as search algorithms culpable of distortion and advertising algorithms culpable of discrimination. The review also suggests some behaviors, domains, methods, and organizations that call for for future audit attention, such as problematic "echo chambers" and other distortion effects from advertising algorithms. The paper concludes by discussing algorithm auditing in the context of other research working toward algorithmic justice.
https://doi.org/10.1145/3449148
In the era of big data and artificial intelligence, online risk detection has become a popular research topic. From detecting online harassment to the sexual predation of youth, the state-of-the-art in computational risk detection has the potential to protect particularly vulnerable populations from online victimization. Yet, this is a high-risk, high-reward endeavor that requires a systematic and human-centered approach to synthesize disparate bodies of research across different application domains, so that we can identify best practices, potential gaps, and set a strategic research agenda for leveraging these approaches in a way that betters society. Therefore, we conducted a comprehensive literature review to analyze 73 peer-reviewed articles on computational approaches utilizing text or meta-data/multimedia for online sexual risk detection. We identified sexual grooming (75%), sex trafficking (12%), and sexual harassment and/or abuse (12%) as the three types of sexual risk detection present in the extant literature. Furthermore, we found that the majority (93%) of this work has focused on identifying sexual predators after-the-fact, rather than taking more nuanced approaches to identify potential victims and problematic patterns that could be used to prevent victimization before it occurs. Many studies rely on public datasets (82%) and third-party annotators (33%) to establish ground truth and train their algorithms. Finally, the majority of this work (78%) mostly focused on algorithmic performance evaluation of their model and rarely (4%) evaluate these systems with real users. Thus, we urge computational risk detection researchers to integrate more human-centered approaches to both developing and evaluating sexual risk detection algorithms to ensure the broader societal impacts of this important work.
https://doi.org/10.1145/3479609