Topic models are widely used analysis techniques for clustering documents and surfacing thematic elements of text corpora. These models remain challenging to optimize and often require a ``human-in-the-loop'' approach where domain experts use their knowledge to steer and adjust. However, the fragility, incompleteness, and opacity of these models means even minor changes could induce large and potentially undesirable changes in resulting model. In this paper we conduct a simulation-based analysis of human-centered interactions with topic models, with the objective of measuring the sensitivity of topic models to common classes of user actions. We find that user interactions have impacts that differ in magnitude but often negatively affect the quality of the resulting modelling in a way that can be difficult for the user to evaluate. We suggest the incorporation of sensitivity and "multiverse" analyses to topic model interfaces to surface and overcome these deficiencies.
AutoML systems can speed up routine data science work and make machine learning available to those without expertise in statistics and computer science. These systems have gained traction in enterprise settings where pools of skilled data workers are limited. In this study, we conduct interviews with 29 individuals from organizations of different sizes to characterize how they currently use, or intend to use, AutoML systems in their data science work. Our investigation also captures how data visualization is used in conjunction with AutoML systems. Our findings identify three usage scenarios for AutoML that resulted in a framework summarizing the level of automation desired by data workers with different levels of expertise. We surfaced the tension between speed and human oversight and found that data visualization can do a poor job balancing the two. Our findings have implications for the design and implementation of human-in-the-loop visual analytics approaches.
More visualization systems are simplifying the data analysis process by automatically suggesting relevant visualizations. However, little work has been done to understand if users trust these automated recommendations. In this paper, we present the results of a crowd-sourced study exploring preferences and perceived quality of recommendations that have been positioned as either human-curated or algorithmically generated. We observe that while participants initially prefer human recommenders, their actions suggest an indifference for recommendation source when evaluating visualization recommendations. The relevance of presented information (e.g., the presence of certain data fields) was the most critical factor, followed by a belief in the recommender's ability to create accurate visualizations. Our findings suggest a general indifference towards the provenance of recommendations, and point to idiosyncratic definitions of visualization quality and trustworthiness that may not be captured by simple measures. We suggest that recommendation systems should be tailored to the information-foraging strategies of specific users.
Root cause analysis is a common data analysis task. While question-answering systems enable people to easily articulate a why question (e.g., why students in Massachusetts have high ACT Math scores on average) and obtain an answer, these systems often produce questionable causal claims. To investigate how such claims might mislead users, we conducted two crowdsourced experiments to study the impact of showing different information on user perceptions of a question-answering system. We found that in a system that occasionally provided unreasonable responses, showing a scatterplot increased the plausibility of unreasonable causal claims. Also, simply warning participants that correlation is not causation seemed to lead participants to accept reasonable causal claims more cautiously. We observed a strong tendency among participants to associate correlation with causation. Yet, the warning appeared to reduce the tendency. Grounded in the findings, we propose ways to reduce the illusion of causality when using question-answering systems.
Creating expressive narrative visualization often requires choosing a well-planned narrative order that invites the audience in. The narrative can either follow the linear order of story events (chronology), or deviate from linearity (anachronies). While evidence exists that anachronies in novels and films can enhance story expressiveness, little is known about how they can be incorporated into narrative visualization. To bridge this gap, this work introduces the idea of narrative linearity to visualization and investigates how different narrative orders affect the expressiveness of time-oriented stories. First, we conducted preliminary interviews with seven experts to clarify the motivations and challenges of manipulating narrative linearity in time-oriented stories. Then, we analyzed a corpus of 80 time-oriented stories and identified six most salient patterns of narrative orders. Next, we conducted a crowdsourcing study with 221 participants. Results indicated that anachronies have the potential to make time-oriented stories more expressive without hindering comprehensibility.
Data videos are a genre of narrative visualization that communicates stories by combining data visualization and motion graphics. While data videos are increasingly gaining popularity, few systematic reviews or structured analyses exist for their design. In this work, we introduce a design space for animated visual narratives in data videos. The design space combines a dimension for animation techniques that are frequently used to facilitate data communication with one for visual narrative strategies served by such animation techniques to support story presentation. We derived our design space from the analysis of 82 high-quality data videos collected from online sources. We conducted a workshop with 20 participants to evaluate the effectiveness of our design space. Qualitative and quantitative feedback suggested that our design space is inspirational and useful for designing and creating data videos.
Using visualization requires people to read abstract visual imagery, estimate statistics, and retain information. However, people with Intellectual and Developmental Disabilities (IDDs) often process information differently, which may complicate connecting abstract visual information to real-world quantities. This population has traditionally been excluded from visualization design, and often has limited access to data related to their well being. We explore how visualizations may better serve this population. We identify three visualization design elements that may improve data accessibility: chart type, chart embellishment, and data continuity. We evaluate these elements with populations both with and without IDDs, measuring accuracy and efficiency in a web-based online experiment with time series and proportion data. Our study identifies performance patterns and subjective preferences for people with IDDs when reading common visualizations. These findings suggest possible solutions that may break the cognitive barriers caused by conventional design guidelines.
Controversial understandings of the coronavirus pandemic have turned data visualizations into a battleground. Defying public health officials, coronavirus skeptics on US social media spent much of 2020 creating data visualizations showing that the government’s pandemic response was excessive and that the crisis was over. This paper investigates how pandemic visualizations circulated on social media, and shows that people who mistrust the scientific establishment often deploy the same rhetorics of data-driven decision-making used by experts, but to advocate for radical policy changes. Using a quantitative analysis of how visualizations spread on Twitter and an ethnographic approach to analyzing conversations about COVID data on Facebook, we document an epistemological gap that leads pro- and anti-mask groups to draw drastically different inferences from similar data. Ultimately, we argue that the deployment of COVID data visualizations reflect a deeper sociopolitical rift regarding the place of science in public life.
In response to COVID-19, a vast number of visualizations have been created to communicate information to the public. Information exposure in a public health crisis can impact people’s attitudes towards and responses to the crisis and risks, and ultimately the trajectory of a pandemic. As such, there is a need for work that documents, organizes, and investigates what COVID-19 visualizations have been presented to the public. We address this gap through an analysis of 668 COVID-19 visualizations. We present our findings through a conceptual framework derived from our analysis, that examines who, (uses) what data, (to communicate) what messages, in what form, under what circumstances in the context of COVID-19 crisis visualizations. We provide a set of factors to be considered within each component of the framework. We conclude with directions for future crisis visualization research.
Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed for evaluating whether allowing users to interact with a static visualization can improve their reasoning. Through two crowdsourced studies, we show that adding interaction to a static Bayesian reasoning visualization does not improve participants’ accuracy on a Bayesian reasoning task. In some cases, it can significantly detract from it. Moreover, we demonstrate that underlying visualization design modulates performance and that people with high versus low spatial ability respond differently to different interaction techniques and underlying base visualizations. Our work suggests that interaction is not as unambiguously good as we often believe; a well designed static visualization can be as, if not more, effective than an interactive one.
Charts often contain visually prominent features that draw attention to aspects of the data and include text captions that emphasize aspects of the data. Through a crowdsourced study, we explore how readers gather takeaways when considering charts and captions together. We first ask participants to mark visually prominent regions in a set of line charts. We then generate text captions based on the prominent features and ask participants to report their takeaways after observing chart-caption pairs. We find that when both the chart and caption describe a high-prominence feature, readers treat the doubly emphasized high-prominence feature as the takeaway; when the caption describes a low-prominence chart feature, readers rely on the chart and report a higher-prominence feature as the takeaway. We also find that external information that provides context, helps further convey the caption’s message to the reader. We use these findings to provide guidelines for authoring effective chart-caption pairs.
Visualizations designed to make readers compassionate with the persons whose data is represented have been called anthropographics and are commonly employed by practitioners. Empirical studies have recently examined whether anthropographics indeed promote empathy, compassion, or the likelihood of prosocial behavior, but findings have been inconclusive so far. This work contributes a detailed overview of past experiments and two new experiments that use large samples and a combination of design strategies to maximize the possibility of finding an effect. We tested an information-rich anthropographic against a simple bar chart, asking participants to allocate hypothetical money in a crowdsourcing study. We found that the anthropographic had, at best, a small effect on money allocation. Such a small effect may be relevant for large-scale donation campaigns, but the large sample sizes required to observe an effect and the noise involved in measuring it make it very difficult to study in more depth. Data and code are available at https://osf.io/xqae2/.
This research investigates how people engage with data visualizations when commenting on the social platform Reddit. There has been considerable research on collaborative sensemaking with visualizations and the personal relation of people with data. Yet, little is known about how public audiences without specific expertise and shared incentives openly express their thoughts, feelings, and insights in response to data visualizations. Motivated by the extensive social exchange around visualizations in online communities, this research examines characteristics and motivations of people’s reactions to posts featuring visualizations. Following a Grounded Theory approach, we study 475 reactions from the /r/dataisbeautiful community, identify ten distinguishable reaction types, and consider their contribution to the discourse. A follow-up survey with 168 Reddit users clarified their intentions to react. Our results help understand the role of personal perspectives on data and inform future interfaces that integrate audience reactions into visualizations to foster a public discourse about data.
Infographics range from minimalism that aims to convey the raw data to elaborately decorated, or embellished, graphics that aim to engage readers by telling a story. Studies have shown evidence to negative, but also positive, effects on embellishments. We conducted a set of experiments to gauge more precisely how embellishments affect how people relate to infographics and make sense of the conveyed story. We analyzed questionnaires, interviews, and eye-tracking data simplified by bundling to find how embellishments affect reading infographics, beyond engagement, memorization, and recall. We found that, within bounds, embellishments have a positive effect on how users get engaged in understanding an infographic, with very limited downside. To our knowledge, our work is the first that fuses the aforementioned three information sources gathered from the same data-and-user corpus to understand infographics. Our findings can help to design more fine-grained studies to quantify embellishment effects and also to design infographics that effectively use embellishments.