Dimensionality reduction (DR) techniques are essential for visually analyzing high-dimensional data. However, visual analytics using DR often face unreliability, stemming from factors such as inherent distortions in DR projections. This unreliability can lead to analytic insights that misrepresent the underlying data, potentially resulting in misguided decisions. To tackle these reliability challenges, we review 133 papers that address the unreliability of visual analytics using DR. Through this review, we contribute (1) a workflow model that describes the interaction between analysts and machines in visual analytics using DR, and (2) a taxonomy that identifies where and why reliability issues arise within the workflow, along with existing solutions for addressing them. Our review reveals ongoing challenges in the field, whose significance and urgency are validated by five expert researchers. This review also finds that the current research landscape is skewed toward developing new DR techniques rather than their interpretation or evaluation, where we discuss how the HCI community can contribute to broadening this focus.
Participatory data physicalisation (PDP) is recognised for its potential to support data-driven decisions among stakeholders who collaboratively construct physical elements into commonly insightful visualisations.
Like all participatory processes, PDP is however influenced by underlying power dynamics that might lead to issues regarding extractive participation, marginalisation, or exclusion, among others.
We first identified the decisions behind these power dynamics by developing an ontology that synthesises critical theoretical insights from both visualisation and participatory design research, which were then
systematically applied unto a representative corpus of 23 PDP artefacts.
By revealing how shared decisions are guided by different agendas, this paper presents three contributions:
1) a cross-disciplinary ontology that facilitates the systematic analysis of existing and novel PDP artefacts and processes; which leads to
2) six PDP agendas that reflect the key power dynamics in current PDP practice, revealing the diversity of orientations towards stakeholder participation in PDP practice; and
3) a set of critical considerations that should guide how power dynamics can be balanced, such as by reflecting on how issues are represented, data is contextualised, participants express their meanings, and how participants can dissent with flexible artefact construction.
Consequently, this study advances a feminist research agenda by guiding researchers and practitioners in openly reflecting on and sharing responsibilities in data physicalisation and participatory data visualisation.
Independent algorithm audits hold the promise of bringing accountability to automated decision-making. However, third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data. To study how these limitations impact research integrity, we conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction. We examine the accuracy of estimating group parity metrics across three levels of access: (a) aggregated statistics, (b) individual-level data with model outputs, and (c) individual-level data without model outputs. Despite selecting one of the simplest tasks for algorithmic auditing, we find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments. We discuss implications for independent auditors, as well as potential avenues for HCI researchers and regulators to improve data access and enable both reliable and holistic evaluations.
Despite significant work in HCI on understanding the role of data tools for non-profits and grassroots communities, there has been limited focus on cooperatives. This paper examines the role of financial, social, and building upkeep data in a non-profit cooperative housing organization in Toronto, Canada (alias named NXI). Through a 16-month-long ethnographic study, including 24 interviews, we investigate the role of and tensions in data practices related to NXI’s daily maintenance and operations, cooperation, and sustainability. We find that NXI’s current data practices are functional and meaningful—sometimes requiring team workarounds—in the short term. However, various tensions and deficiencies in data practices hamper NXI’s sustainability. By contextualizing the temporal affordances of data, we propose design implications for data tools to align effectively with the practices of cooperatives and enhance organizational sustainability. Finally, we discuss how data designers and researchers, organizations or grassroots communities, and financial technology designers can benefit from our work, especially with regard to the maintenance and sustainability of small-scale organizations.
Data wrangling is a time-consuming and challenging task in the early stages of a data science pipeline. However, existing tools often fail to effectively interpret user intent. We propose Dango, a mixed-initiative multi-agent system that helps users generate data wrangling scripts. Compared to existing tools, Dango enhances user communication of intent by: (1) allowing users to demonstrate on multiple tables and use natural language prompts in a conversation interface, (2) enabling users to clarify their intent by answering LLM-posed multiple-choice clarification questions, and (3) providing multiple forms of feedback such as step-by-step NL explanations and data provenance to help users evaluate the data wrangling scripts. In a within-subjects, think-aloud study (n=38), the results show that Dango's features can significantly improve intent clarification, accuracy, and efficiency in data wrangling tasks.
Ridgeline plots are frequently employed to visualize the evolution or distributions of multiple series with a pile of overlapping line, area, or bar charts, highlighting the peak patterns. While traditionally viewed as small multiple visualizations, their ridge-like patterns have increasingly attracted graphic designers to create appealing customized ridgeline plots. However, many tools only support creating basic ridgeline plots and overlook their diverse layouts and styles. This paper introduces a comprehensive design space for ridgeline plots, focusing on their varied layouts and expressive styles. We present RidgeBuilder, an intuitive tool for creating expressive ridgeline plots with customizable layouts and styles. In particular, we summarize three goals for refining the layout of ridgeline plots and propose an optimization method. We assess RidgeBuilder's usability and usefulness through a reproduction study and evaluate the layout optimization algorithm through anonymized questionnaires. The effectiveness is demonstrated with a gallery of ridgeline plots created by RidgeBuilder.
Data profiling plays a critical role in understanding the structure of complex datasets and supporting numerous downstream tasks, such as social media analytics and financial fraud detection. While existing research predominantly focuses on structured data formats, a substantial portion of semi-structured textual data still requires ad-hoc and arduous manual profiling to extract and comprehend its internal structures. In this work, we propose StructVizor, an interactive profiling system that facilitates sensemaking and transformation of semi-structured textual data. Our tool mainly addresses two challenges: a) extracting and visualizing the diverse structural patterns within data, such as how information is organized or related, and b) enabling users to efficiently perform various wrangling operations on textual data. Through automatic data parsing and structure mining, StructVizor enables visual analytics of structural patterns, while incorporating novel interactions to enable profile-based data wrangling. A comparative user study involving 12 participants demonstrates the system's usability and its effectiveness in supporting exploratory data analysis and transformation tasks.