TRACTUS: Understanding and Supporting Source Code Experimentation in Hypothesis-Driven Data Science

要旨

Data scientists experiment heavily with their code, compromising code quality to obtain insights faster. We observed ten data scientists perform hypothesis-driven data science tasks, and analyzed their coding, commenting, and analysis practice. We found that they have difficulty keeping track of their code experiments. When revisiting exploratory code to write production code later, they struggle to retrace their steps and capture the decisions made and insights obtained, and have to rerun code frequently. To address these issues, we designed TRACTUS, a system extending the popular RStudio IDE, that detects, tracks, and visualizes code experiments in hypothesis-driven data science tasks. TRACTUS helps recall decisions and insights by grouping code experiments into hypotheses, and structuring information like code execution output and documentation. Our user studies show how TRACTUS improves data scientists' workflows, and suggest additional opportunities for improvement. TRACTUS is available as an open source RStudio IDE addin at http://hci.rwth-aachen.de/tractus.

キーワード
Data Science
Programming IDE
Exploratory programming
Information visualization
Observational study
著者
Krishna Subramanian
RWTH Aachen University, Aachen, Germany
Johannes Maas
RWTH Aachen University, Aachen, Germany
Jan Borchers
RWTH Aachen University, Aachen, Germany
DOI

10.1145/3313831.3376764

論文URL

https://doi.org/10.1145/3313831.3376764

動画

会議: CHI 2020

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2020.acm.org/)

セッション: Visualizing trees, networks & paths

Paper session
316A MAUI
5 件の発表
2020-04-30 20:00:00
2020-04-30 21:15:00
日本語まとめ
読み込み中…