Diff in the Loop: Supporting Data Comparison in Exploratory Data Analysis

要旨

Data science is characterized by evolution: since data science is exploratory, results evolve from moment to moment; since it can be collaborative, results evolve as the work changes hands. While existing tools help data scientists track changes in code, they provide less support for understanding the iterative changes that the code produces in the data. We explore the idea of visualizing differences in datasets as a core feature of exploratory data analysis, a concept we call Diff in the Loop (DITL). We evaluated DITL in a user study with 16 professional data scientists and found it helped them understand the implications of their actions when manipulating data. We summarize these findings and discuss how the approach can be generalized to different data science workflows.

著者
April Yi. Wang
University of Michigan, Ann Arbor, Michigan, United States
Will Epperson
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Robert A. DeLine
Microsoft Corp, Redmond, Washington, United States
Steven M.. Drucker
Microsoft Research, Redmond, Washington, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502123

動画

会議: CHI 2022

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)

セッション: Interacting with Data

291
5 件の発表
2022-05-03 18:00:00
2022-05-03 19:15:00