Computational notebooks offer a flexible environment for exploratory data analysis (EDA), but this flexibility often leads to disorganized and iterative execution of notebook cells, making it difficult to track how data states evolve. Consequently, data scientists must devote extra mental effort to staying aware of data states, which is both tedious and prone to overlooking anomalies. To address this challenge, we developed NoteFlow, a notebook extension that leverages charts as ``sight glasses'' to provide a consistent and continuous tracing of data flow. NoteFlow allows users to (1) validate various facets of the current data state using recommended charts provided immediately after each cell execution, and (2) trace the global evolution of selected charts to continuously observe how particular data attributes evolve throughout the EDA process. We evaluated NoteFlow's effectiveness through a controlled study with 12 participants and a one-month field study with 2 data scientists on real-world workflows.
ACM CHI Conference on Human Factors in Computing Systems