Computational notebooks & tutorials

Paper session

会議の名前
CHI 2020
What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities
要旨

Computational notebooks — such as Azure, Databricks, and Jupyter — are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists incorporate more of their activities into notebooks, they encounter unexpected difficulties, or pain points, that impact their productivity and disrupt their workflow. Through a systematic, mixed-methods study using semi-structured interviews (n=20) and survey (n=156) with data scientists, we catalog nine pain points when working with notebooks. Our findings suggest that data scientists face numerous pain points throughout the entire workflow — from setting up notebooks to deploying to production — across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.

受賞
Honorable Mention
キーワード
Computational notebooks
challenges
data science
interviews
pain points
survey
著者
Souti Chattopadhyay
Oregon State University, Corvallis, OR, USA
Ishita Prasad
Microsoft, Redmond, WA, USA
Austin Z. Henley
University of Tennessee–Knoxville, Knoxville, TN, USA
Anita Sarma
Oregon State University, Corvallis, OR, USA
Titus Barik
Microsoft, Redmond, WA, USA
DOI

10.1145/3313831.3376729

論文URL

https://doi.org/10.1145/3313831.3376729

Callisto: Capturing the "Why" by Connecting Conversations with Computational Narratives
要旨

When teams of data scientists collaborate on computational notebooks, their discussions often contain valuable insight into their design decisions. These discussions not only explain analysis in the current notebook but also alternative paths, which are often poorly documented. However, these discussions are disconnected from the notebooks for which they could provide valuable context. We propose Callisto, an extension to computational notebooks that captures and stores contextual links between discussion messages and notebook elements with minimal effort from users. Callisto allows notebook readers to better understand the current notebook content and the overall problem-solving process that led to it, by making it possible to browse the discussions and code history relevant to any part of the notebook. This is particularly helpful for onboarding new notebook collaborators to avoid misinterpretations and duplicated work, as we found in a two-stage evaluation with 32 data science students.

受賞
Honorable Mention
キーワード
Computational Notebooks
Collaborative Systems
DataScience
Literate Programming
著者
April Yi Wang
University of Michigan – Ann Arbor, Ann Arbor, MI, USA
Zihan Wu
Tsinghua University, Beijing, China
Christopher Brooks
University of Michigan – Ann Arbor, Ann Arbor, MI, USA
Steve Oney
University of Michigan – Ann Arbor, Ann Arbor, MI, USA
DOI

10.1145/3313831.3376740

論文URL

https://doi.org/10.1145/3313831.3376740

動画
Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists
要旨

Data wrangling is a difficult and time-consuming activity in computational notebooks, and existing wrangling tools do not fit the exploratory workflow for data scientists in these environments. We propose a unified interaction model based on programming-by-example that generates readable code for a variety of useful data transformations, implemented as a Jupyter notebook extension called Wrex. User study results demonstrate that data scientists are significantly more effective and efficient at data wrangling with Wrex over manual programming. Qualitative participant feedback indicates that Wrex was useful and reduced barriers in having to recall or look up the usage of various data transform functions. The synthesized code allowed data scientists to verify the intended data transformation, increased their trust and confidence in Wrex, and fit seamlessly within their cell-based notebook workflows. This work suggests that presenting readable code to professional data scientists is an indispensable component of offering data wrangling tools in notebooks.

受賞
Best Paper
キーワード
Computational Notebooks
Program Synthesis
Data Science
著者
Ian Drosos
University of California, San Diego, La Jolla, CA, USA
Titus Barik
Microsoft, Redmond, WA, USA
Philip J. Guo
University of California, San Diego, La Jolla, CA, USA
Robert DeLine
Microsoft, Redmond, WA, USA
Sumit Gulwani
Microsoft, Redmond, WA, USA
DOI

10.1145/3313831.3376442

論文URL

https://doi.org/10.1145/3313831.3376442

Composing Flexibly-Organized Step-by-Step Tutorials from Linked Source Code, Snippets, and Outputs
要旨

Programming tutorials are a pervasive, versatile medium for teaching programming. In this paper, we report on the content and structure of programming tutorials, the pain points authors experience in writing them, and a design for a tool to help improve this process. An interview study with 12 experienced tutorial authors found that they construct documents by interleaving code snippets with text and illustrative outputs. It also revealed that authors must often keep related artifacts of source programs, snippets, and outputs consistent as a program evolves. A content analysis of 200 frequently-referenced tutorials on the web also found that most tutorials contain related artifacts—duplicate code and outputs generated from snippets—that an author would need to keep consistent with each other. To address these needs, we designed a tool called Torii with novel authoring capabilities. An in-lab study showed that tutorial authors can successfully use the tool for the unique affordances identified, and provides guidance for designing future tools for tutorial authoring.

受賞
Honorable Mention
キーワード
Programming tutorials
literate programming
authoring
code evolution
consistency
code editors
著者
Andrew Head
University of California, Berkeley, Berkeley, CA, USA
Jason Jiang
University of California, Berkeley, Berkeley, CA, USA
James Smith
University of California, Berkeley, Berkeley, CA, USA
Marti A. Hearst
University of California, Berkeley, Berkeley, CA, USA
Björn Hartmann
University of California, Berkeley, Berkeley, CA, USA
DOI

10.1145/3313831.3376798

論文URL

https://doi.org/10.1145/3313831.3376798

Between Scripts and Applications: Computational Media for the Frontier of Nanoscience
要旨

The popularity of computational notebooks heralds a return of software as computational media rather than turn-key applications. We believe this software model has potential beyond supporting just the computationally literate. We studied a biomolecular nano-design lab that works on a current frontier of science – RNA origami – whose researchers depend on computational tools to do their work, yet are not trained as programmers. Using a participatory design process, we developed a computational labbook to concretise what computational media could look like, using the principles of computability, malleability, shareability, and distributability suggested by previous work. We used this prototype to co-reflect with the nanoscientists about how it could transform their practice. We report on the computational culture specific to this research area; the scientists' struggles managing their computational environments; and their subsequent disempowerment yet dependence. Lastly, we discuss the generative potential and limitations of the four design principles for the future of computational media.

キーワード
Computational media
Computational notebook
Electronic laboratory notebook
Participatory design
著者
Midas Nouwens
Aarhus University, Aarhus, Denmark
Marcel Borowski
Aarhus University, Aarhus, Denmark
Bjarke Fog
Aarhus University, Aarhus, Denmark
Clemens Nylandsted Klokmose
Aarhus University, Aarhus, Denmark
DOI

10.1145/3313831.3376287

論文URL

https://doi.org/10.1145/3313831.3376287

動画