The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances. We conduct formative research with machine learning practitioners at Apple and find that conventional confusion matrices do not support more complex data-structures found in modern-day applications, such as hierarchical and multi-output labels. To express such variations of confusion matrices, we design an algebra that models confusion matrices as probability distributions. Based on this algebra, we develop Neo, a visual analytics system that enables practitioners to flexibly author and interact with hierarchical and multi-output confusion matrices, visualize derived metrics, renormalize confusions, and share matrix specifications. Finally, we demonstrate Neo's utility with three model evaluation scenarios that help people better understand model performance and reveal hidden confusions.
https://dl.acm.org/doi/abs/10.1145/3491102.3501823
With the wide usage of data visualizations, a huge number of Scalable Vector Graphic (SVG)-based visualizations have been created and shared online. Accordingly, there has been an increasing interest in exploring how to retrieve perceptually similar visualizations from a large corpus, since it can benefit various downstream applications such as visualization recommendation. Existing methods mainly focus on the visual appearance of visualizations by regarding them as bitmap images. However, the structural information intrinsically existing in SVG-based visualizations is ignored. Such structural information can delineate the spatial and hierarchical relationship among visual elements, and characterize visualizations thoroughly from a new perspective. This paper presents a structure-aware method to advance the performance of visualization retrieval by collectively considering both the visual and structural information. We extensively evaluated our approach through quantitative comparisons, a user study and case studies. The results demonstrate the effectiveness of our approach and its advantages over existing methods.
https://dl.acm.org/doi/abs/10.1145/3491102.3502048
Data visualizations are created and shared on the web at an unprecedented speed, raising new needs and questions for processing and analyzing visualizations after they have been generated and digitized. However, existing formalisms focus on operating on a single visualization instead of multiple visualizations, making it challenging to perform analysis tasks such as sorting and clustering visualizations. Through a systematic analysis of previous work, we abstract visualization-related tasks into mathematical operators such as union and propose a design space of visualization operations. We realize the design by developing ComputableViz, a library that supports operations on multiple visualization specifications. To demonstrate its usefulness and extensibility, we present multiple usage scenarios concerning processing and analyzing visualization, such as generating visualization embeddings and automatically making visualizations accessible. We conclude by discussing research opportunities and challenges for managing and exploiting the massive visualizations on the web.
https://dl.acm.org/doi/abs/10.1145/3491102.3517618
The promise of visualization recommendation systems is that analysts will be automatically provided with relevant and high-quality visualizations that will reduce the work of manual exploration or chart creation. However, little research to date has focused on what analysts \textit{value} in \revised{the design of} visualization recommendations. We interviewed 18 analysts in the public health sector and explored how they made sense of a popular in-domain dataset\footnote{National Health and Nutrition Examination Study 2013-2014~\cite{centers2013nhanes}.} in service of generating visualizations to recommend to others. We also explored how they interacted with a corpus of both automatically- and manually-generated visualization recommendations, with the goal of uncovering how the design values of these analysts are reflected in current visualization recommendation systems. We find that analysts \revised{champion} simple charts with clear takeaways that are nonetheless connected with existing semantic information or domain hypotheses. We conclude by recommending that visualization recommendation designers explore ways of integrating context and expectation into their systems.
https://dl.acm.org/doi/abs/10.1145/3491102.3501891
Data exploration systems have become popular tools with which data analysts and others can explore raw data and organize their observations. However, users of such systems who are unfamiliar with their datasets face several challenges when trying to extract data events of interest to them. Those challenges include progressively discovering informative charts, organizing them into a logical order to depict a meaningful fact, and arranging one or more facts to illustrate a data event. To alleviate them, we propose VisGuide - a data exploration system that generates personalized recommendations to aid users’ discovery of data events in breadth and depth by incrementally learning their data exploration preferences and recommending meaningful charts tailored to them. As well as user preferences, VisGuide’s recommendations simultaneously consider sequence organization and chart presentation. We conducted two user studies to evaluate 1) the usability of VisGuide and 2) user satisfaction with its recommendation system. The results of those studies indicate that VisGuide can effectively help users create coherent and user-oriented visualization trees that represent meaningful data events.
https://dl.acm.org/doi/abs/10.1145/3491102.3517648