There has been a great deal of scholarly attention on issues of identity-related bias in machine learning. Much of this attention has focused on data and data workers, workers who do annotation tasks. Yet tech workers—like engineers, data scientists, and researchers—introduce their own “biases” when defining “identity” concepts. More specifically, they instill their own positionalities, the way they understand and are shaped by the world around them. Through interviews with industry tech workers who focus on computer vision, we show how workers embed their own positional perspectives into products and how positional gaps can lead to unforeseen and undesirable outcomes. We discuss how worker positionality is mutually shaped by the contexts in which they are embedded. We provide implications for researchers and practitioners to engage with the positionalities of tech workers, as well as those in contexts outside of development that influence tech workers.
https://doi.org/10.1145/3613904.3641890
Understanding the link between visual attention and users' information needs when visually exploring information visualisations is under-explored due to a lack of large and diverse datasets to facilitate these analyses. To fill this gap we introduce SalChartQA -- a novel crowd-sourced dataset that uses the BubbleView interface to track user attention and a question-answering (QA) paradigm to induce different information needs in users. SalChartQA contains 74,340 answers to 6,000 questions on 3,000 visualisations. Informed by our analyses demonstrating the close correlation between information needs and visual saliency, we propose the first computational method to predict question-driven saliency on visualisations. Our method outperforms state-of-the-art saliency models for several metrics, such as the correlation coefficient and the Kullback-Leibler divergence. These results show the importance of information needs for shaping attentive behaviour and pave the way for new applications, such as task-driven optimisation of visualisations or explainable AI in chart question-answering.
https://doi.org/10.1145/3613904.3642942
As phones have become cheaper, there are still instances where people share them. Researchers have explored the sharing in the context of developing economies and brought to light the barriers to ownership and highlight the resulting power differentials. In this work, we explore the dynamics of single and multi-device ownership and sharing in Kenya. Through interviews with 34 participants, we seek to understand what these ownership patterns inform us about affordances and unstated needs--adding to our knowledge of device usage. We find that these dimensions of ownership raise new questions about ethics and survival, and we describe how they also serve as bellwethers to designing for a developing economy--especially in the context of access to money and other financial infrastructures. We discuss the impact and harms of unregulated policies and the influence of survival on peoples' choices, the implications on ethics, and further explore strategies for identifying, auditing, and mitigating these risks.
https://doi.org/10.1145/3613904.3642874
Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without their consent. The degree of over-representation ranges from around 2x to around 6.5x. Given that LLMs may substitute for the paid labor of those who produced their training data, they have the potential to cause even more substantial and disproportionate economic harm to Jewish Americans in the coming years. This paper focuses on Jewish Americans as a case study, but it is probable that other minority communities (e.g., Asian Americans, Hindu Americans) may be similarly affected and, most importantly, the results should likely be interpreted as a ``canary in the coal mine'' that highlights deep structural concerns about the current LLM paradigm whose harms could soon affect nearly everyone. We discuss the implications of these results for the policymakers thinking about how to regulate LLMs as well as for those in the AI field who are working to advance LLMs. Our findings stress the importance of working together towards alternative LLM paradigms that avoid both disparate impacts and widespread societal harms.
https://doi.org/10.1145/3613904.3642749
With changing attitudes around knowledge, medicine, art, and technology, the human body has become a source of information and, ultimately, shareable and analyzable data. Centuries of illustrations and visualizations of the body occur within particular historical, social, and political contexts. These contexts are enmeshed in different so-called data cultures: ways that data, knowledge, and information are conceptualized and collected, structured and shared. In this work, we explore how information about the body was collected as well as the circulation, impact, and persuasive force of the resulting images. We show how mindfulness of data cultural influences remain crucial for today's designers, researchers, and consumers of visualizations. We conclude with a call for the field to reflect on how visualizations are not timeless and contextless mirrors on objective data, but as much a product of our time and place as the visualizations of the past.
https://doi.org/10.1145/3613904.3642056