Playing with Data

会議の名前
CHI 2025
DataSentry: Building Missing Data Management System for In-the-Wild Mobile Sensor Data Collection through Multi-Year Iterative Design Approach
要旨

Mobile sensor data collection in people’s daily lives is essential for understanding fine-grained human behaviors. However, in-the-wild data collection often results in missing data due to participant and system-related issues. While existing monitoring systems in the mobile sensing field provide an opportunity to detect missing data, they fall short in monitoring data across many participants and sensors and diagnosing the root causes of missing data, accounting for heterogeneous sensing characteristics of mobile sensor data. To address these limitations, we undertook a multi-year iterative design process to develop a system for monitoring missing data in mobile sensor data collection. Our final prototype, DataSentry, enables the detection, diagnosis, and addressing of missing data issues across many participants and sensors, considering both within- and between-person variability. Based on the iterative design process, we share our experiences, lessons learned, and design implications for developing advanced missing data management systems.

著者
Yugyeong Jung
KAIST, Daejeon, Korea, Republic of
Hei Yiu Law
Korea Advanced Institute of Science and Technology, Daejeon, Korea, Republic of
Hadong Lee
Seoul National University, Seoul, Seoul, Korea, Republic of
Junmo Lee
KAIST, Daejeon, Korea, Republic of
Bongshin Lee
Yonsei University, Seoul, Korea, Republic of
Uichin Lee
KAIST, Daejeon, Korea, Republic of
DOI

10.1145/3706598.3713314

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713314

動画
How To Draw Commands? An Elicitation Study for Sketching on Spreadsheets
要旨

Sketching is one of the oldest techniques humans use to express themselves. We sketch to visualize concepts, externalize memory, and communicate ideas. However, we barely use sketching to interact with computers. Given how naturally sketching comes to humans, we believe untapped potential exists in being able to simply draw commands onto a user interface. In this paper, we present results of an elicitation study about expressing common operations in spreadsheets through sketching. Spreadsheets are an interesting class of applications because they are widely used, support complex data and operations, and are available on touch-enabled devices. Our results show that despite considerable variation in syntactic details, participants gravitate towards recurring patterns (\eg\ enclosures and arrows, examples and cross-references, and temporal sequences of strokes). The sketch patterns we identified can be a first step towards developing interpreters of sketched commands, and thus enable new means of interacting with spreadsheets and other applications.

受賞
Honorable Mention
著者
Marc Hesenius
University of Duisburg-Essen, Essen, Germany
Mak Krvavac
University of Duisburg-Essen, Essen, Germany
Valbjörn Jón Valbjörnsson
University of Iceland, Reykjavík, Iceland
Theresia Mita Erika
University of Iceland, Reykjavík, Iceland
Matthias Book
University of Iceland, Reykjavik, Iceland
DOI

10.1145/3706598.3715269

論文URL

https://dl.acm.org/doi/10.1145/3706598.3715269

動画
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
要旨

Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these "unknown unknowns" is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate "unknown unknowns" in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment with Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse, and relevant model safety prompts. We find that Amplio enabled red teamers to augment data quickly and creatively, highlighting the transformative potential of interactive augmentation workflows.

受賞
Honorable Mention
著者
Catherine Yeh
Harvard University, Boston, Massachusetts, United States
Donghao Ren
Apple, Seattle, Washington, United States
Yannick Assogba
Apple, Cambridge, Massachusetts, United States
Dominik Moritz
Apple, Pittsburgh, Pennsylvania, United States
Fred Hohman
Apple, Seattle, Washington, United States
DOI

10.1145/3706598.3713491

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713491

動画
Xavier: Toward Better Coding Assistance in Authoring Tabular Data Wrangling Scripts
要旨

Data analysts frequently employ code completion tools in writing custom scripts to tackle complex tabular data wrangling tasks. However, existing tools do not sufficiently link the data contexts such as schemas and values with the code being edited. This not only leads to poor code suggestions, but also frequent interruptions in coding processes as users need additional code to locate and understand relevant data. We introduce Xavier, a tool designed to enhance data wrangling script authoring in computational notebooks. Xavier maintains users' awareness of data contexts while providing data-aware code suggestions. It automatically highlights the most relevant data based on the user's code, integrates both code and data contexts for more accurate suggestions, and instantly previews data transformation results for easy verification. To evaluate the effectiveness and usability of Xavier, we conducted a user study with 16 data analysts, showing its potential to streamline data wrangling scripts authoring.

著者
Yunfan Zhou
Zhejiang University, Hangzhou, Zhejiang, China
Xiwen Cai
Zhejiang University, Hangzhou, Zhejiang, China
Qiming Shi
Zhejiang University, Hangzhou, Zhejiang, China
Yanwei Huang
Zhejiang University, Hangzhou, Zhejiang, China
Haotian Li
Microsoft Research Asia, Beijing, China
Huamin Qu
The Hong Kong University of Science and Technology, Hong Kong, China
Di Weng
Zhejiang University, Ningbo, Zhejiang, China
Yingcai Wu
Zhejiang University, Hangzhou, Zhejiang, China
DOI

10.1145/3706598.3714239

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714239

動画
Divisi: Interactive Search and Visualization for Scalable Exploratory Subgroup Analysis
要旨

Analyzing data subgroups is a common data science task to build intuition about a dataset and identify areas to improve model performance. However, subgroup analysis is prohibitively difficult in datasets with many features, and existing tools limit unexpected discoveries by relying on user-defined or static subgroups. We propose exploratory subgroup analysis as a set of tasks in which practitioners discover, evaluate, and curate interesting subgroups to build understanding about datasets and models. To support these tasks we introduce Divisi, an interactive notebook-based tool underpinned by a fast approximate subgroup discovery algorithm. Divisi's interface allows data scientists to interactively re-rank and refine subgroups and to visualize their overlap and coverage in the novel Subgroup Map. Through a think-aloud study with 13 practitioners, we find that Divisi can help uncover surprising patterns in data features and their interactions, and that it encourages more thorough exploration of subtypes in complex data.

著者
Venkatesh Sivaraman
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Zexuan Li
University of Michigan, Ann Arbor, Michigan, United States
Adam Perer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
DOI

10.1145/3706598.3713103

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713103

動画
TableCanoniser: Interactive Grammar-Powered Transformation of Messy, Non-Relational Tables to Canonical Tables
要旨

TableCanoniser is a declarative grammar and interactive system for constructing relational tables from messy tabular inputs such as spreadsheets. We propose the concept of axis alignment to categorise input types and characterise the expanded scope of our system relative to existing tools. The declarative grammar consists of match conditions, which specify repeating patterns of input cells, and extract operations, which specify how matched values map to the output table. In the interactive interface, users can specify match and extract patterns by interacting with an input table, or author more advanced specifications in the coding panel. To refine and verify specifications, users interact with grammar-based provenance visualisations such as linked highlighting of input and output values, tree-based visualisation of matching patterns, and a mini-map overview of matched instances of patterns with annotations showing where cells are extracted to. We motivate and illustrate our work with real-world usage scenarios and workflows.

受賞
Honorable Mention
著者
Kai Xiong
Zhejiang University, Hangzhou, Zhejiang, China
Cynthia A. Huang
Monash University, Melbourne, Victoria, Australia
Michael Wybrow
Monash University, Melbourne, VIC, Australia
Yingcai Wu
Zhejiang University, Hangzhou, Zhejiang, China
DOI

10.1145/3706598.3714321

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714321

Emerging Data Practices: Data Work in the Era of Large Language Models
要旨

Data is one of the foundational aspects of making Artificial Intelligence (AI) work as intended. As large language models (LLMs) become the epicenter of AI, it is crucial to understand better how the datasets that maintain such models are created. The emergent nature of LLMs makes it critical to understand the challenges practitioners developing Gen AI technologies face to design alternatives for better responding to Gen AI's ethical issues. In this paper, we provide such understanding by reporting on 25 interviews with practitioners who handle data in three distinct development stages of different LLMs. Our contributions are (1) empirical evidence of how uncertainty, data practices, and reliance mechanisms change across LLMs' development cycle; (2) how the unique qualities of LLMs impact data practices and their implications for the future of Gen AI technologies; and (3) provide three opportunities for HCI researchers interested in supporting practitioners developing Gen AI technologies.

著者
Adriana Alvarado Garcia
IBM Research, Yorktown Heights, New York, United States
Heloisa Candello
IBM Research, Sao Paulo, Brazil
Karla Badillo-Urquiola
University of Notre Dame, South Bend, Indiana, United States
Marisol Wong-Villacres
Escuela Superior Politécnica del Litoral, Guayaquil, Ecuador
DOI

10.1145/3706598.3714069

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714069

動画