Engineering Development Support

https://doi.org/10.1145/3411764.3445527

Computational notebooks, which seamlessly interleave code with results, have become a popular tool for data scientists due to the iterative nature of exploratory tasks. However, notebooks provide a single execution state for users to manipulate through creating and manipulating variables. When exploring alternatives, data scientists must carefully create many-step manipulations in visually distant cells. We conducted formative interviews with 6 professional data scientists, motivating design principles behind exposing multiple states. We introduce forking --- creating a new interpreter session --- and backtracking --- navigating through previous states. We implement these interactions as an extension to notebooks that help data scientists more directly express and navigate through decision points a single notebook. In a qualitative evaluation, 11 professional data scientists found the tool would be useful for exploring alternatives and debugging code to create a predictive model. Their insights highlight further challenges to scaling this functionality.

University of California, Berkeley, Berkeley, California, United States

Microsoft Research, Redmond, Washington, United States

Microsoft, Redmond, Washington, United States

Microsoft Corp, Redmond, Washington, United States

10.1145/3411764.3445527

https://doi.org/10.1145/3411764.3445048

Code search is an important and frequent activity for developers using computational notebooks (e.g., Jupyter). The flexibility of notebooks brings challenges for effective code search, where classic search interfaces for traditional software code may be limited. In this paper, we propose, NBSearch, a novel system that supports semantic code search in notebook collections and interactive visual exploration of search results. NBSearch leverages advanced machine learning models to enable natural language search queries and intuitive visualizations to present complicated intra- and inter-notebook relationships in the returned results. We developed NBSearch through an iterative participatory design process with two experts from a large software company. We evaluated the models with a series of experiments and the whole system with a controlled user study. The results indicate the feasibility of our analytical pipeline and the effectiveness of NBSearch to support code search in large notebook collections.

University of Waterloo, Waterloo, Ontario, Canada

Arizona State University, Tempe, Arizona, United States

Uber Technologies, Inc., San Francisco, California, United States

University of Waterloo, Waterloo, Ontario, Canada

10.1145/3411764.3445048

https://doi.org/10.1145/3411764.3445267

Researchers have explored several avenues to mitigate data scientists' frustrations with computational notebooks, including: (1) live programming, to keep notebook results consistent and up to date; (2) supplementing scripting with graphical user interfaces (GUIs), to improve ease of use; and (3) providing domain-specific languages (DSLs), to raise a script's level of abstraction. This paper introduces Glinda, which combines these three approaches by providing a live programming experience, with interactive results, for a domain-specific language for data science. The language's compiler uses an open-ended set of ``recipes'' to execute steps in the user's data science workflow. Each recipe is intended to combine the expressiveness of a written notation with the ease-of-use of a GUI. Live programming provides immediate feedback to a user's input, whether in the form of program edits or GUI gestures. In a qualitative evaluation with 12 professional data scientists, participants highly rated the live programming and interactive results. They found the language productive and sufficiently expressive and suggested opportunities to extend it.

Microsoft Corp, Redmond, Washington, United States

10.1145/3411764.3445267

https://doi.org/10.1145/3411764.3445538

Training deep neural networks can generate non-descriptive error messages or produce unusual output without any explicit errors at all. While experts rely on tacit knowledge to apply debugging strategies, non-experts lack the experience required to interpret model output and correct Deep Learning (DL) programs. In this work, we identify DL debugging heuristics and strategies used by experts, and use them to guide the design of Umlaut. Umlaut checks DL program structure and model behavior against these heuristics; provides human-readable error messages to users; and annotates erroneous model output to facilitate error correction. Umlaut links code, model output, and tutorial-driven error messages in a single interface. We evaluated Umlaut in a study with 15 participants to determine its effectiveness in helping developers find and fix errors in their DL programs. Participants using Umlaut found and fixed significantly more bugs compared to a baseline condition.

University of California, Berkeley, Berkeley, California, United States

UC Berkeley, Berkeley, California, United States

10.1145/3411764.3445538

https://doi.org/10.1145/3411764.3445265

End-user programmers opportunistically copy-and-paste code snippets from colleagues or the web to accomplish their tasks. Unfortunately, these snippets often don't work verbatim, so these people---who are non-specialists in the programming language---make guesses and tweak the code to understand and apply it successfully. To support their desired workflow and facilitate tweaking and understanding, we built a prototype tool, TweakIt, that provides users with a familiar live interaction to help them understand, introspect, and reify how different code snippets would transform their data. Through a usability study with 14 data analysts, participants found the tool to be useful to understand the function of otherwise unfamiliar code, to increase their confidence about what the code does, to identify relevant parts of code specific to their task, and to proactively explore and evaluate code. Overall, our participants were enthusiastic about incorporating the tool in their own day-to-day work.

UC San Diego, La Jolla, California, United States

Microsoft Research, Cambridge, United Kingdom

Microsoft, Redmond, Washington, United States

Microsoft Research, Cambridge, United Kingdom

10.1145/3411764.3445265

https://doi.org/10.1145/3411764.3445567

Trigger-action programming (if-this-then-that rules) empowers non-technical users to automate services and smart devices. As a user's set of trigger-action programs evolves, the user must reason about behavior differences between similar programs, such as between an original program and several modification candidates, to select programs that meet their goals. To facilitate this process, we co-designed user interfaces and underlying algorithms to highlight differences between trigger-action programs. Our novel approaches leverage formal methods to efficiently identify and visualize differences in program outcomes or abstract properties. We also implemented a traditional interface that shows only syntax differences in the rules themselves. In a between-subjects online experiment with 107 participants, the novel interfaces better enabled participants to select trigger-action programs matching intended goals in complex, yet realistic, situations that proved very difficult when using traditional interfaces showing syntax differences.

University of Chicago, Chicago, Illinois, United States

National University of Singapore, Singapore, Singapore

Brown University, Providence, Rhode Island, United States

University of Chicago, Chicago, Illinois, United States

10.1145/3411764.3445567

https://doi.org/10.1145/3411764.3445654

Many programmers want to use deep learning due to its superior accuracy in many challenging domains. Yet our formative study with ten programmers indicated that, when constructing their own deep neural networks (DNNs), they often had a difficult time choosing appropriate model structures and hyperparameter values. This paper presents ExampleNet---a novel interactive visualization system for exploring common and uncommon design choices in a large collection of open-source DNN projects. ExampleNet provides a holistic view of the distribution over model structures and hyperparameter settings in the corpus of DNNs, so users can easily filter the corpus down to projects tackling similar tasks and compare design choices made by others. We evaluated ExampleNet in a within-subjects study with sixteen participants. Compared with the control condition (i.e., online search), participants using ExampleNet were able to inspect more online examples, make more data-driven design decisions, and make fewer design mistakes.

Harvard University, Cambridge, Massachusetts, United States

10.1145/3411764.3445654

https://doi.org/10.1145/3411764.3445682

This paper explores software’s role in visual art production by examining how artists use and develop software. We conducted interviews with professional artists who were collaborating with software developers, learning software development, and building and maintaining software. We found artists were motivated to learn software development for intellectual growth and access to technical communities. Artists valued efficient workflows through skilled manual execution and personal software development, but avoided high-level forms of software automation. Artists identified conflicts between their priorities and those of professional developers and computational art communities, which influenced how they used computational aesthetics in their work. These findings contribute to efforts in systems engineering research to integrate end-user programming and creativity support across software and physical media, suggesting opportunities for artists as collaborators. Artists’ experiences writing software can guide technical implementations of domain-specific representations, and their experiences in interdisciplinary production can aid inclusive community building around computational tools.

Stanford University, Stanford, California, United States

University of California Santa Barbara, Santa Barbara, California, United States

10.1145/3411764.3445682

https://doi.org/10.1145/3411764.3445326

User interface design is a complex task that involves designers examining a wide range of options. We present Spacewalker, a tool that allows designers to rapidly search a large design space for an optimal web UI with integrated support. Designers first annotate each attribute they want to explore in a typical HTML page, using a simple markup extension we designed. Spacewalker then parses the annotated HTML specification, and intelligently generates and distributes various configurations of the web UI to crowd workers for evaluation. We enhanced a genetic algorithm to accommodate crowd worker responses from pairwise comparison of UI designs, which is crucial for obtaining reliable feedback. Based on our experiments, Spacewalker allows designers to effectively search a large design space of a UI, using the language they are familiar with, and improve their design rapidly at a minimal cost.

University of Washington, Seattle, Washington, United States

Google Research, Mountain View, California, United States

10.1145/3411764.3445326

https://doi.org/10.1145/3411764.3445043

Reverse engineering (RE) of user interfaces (UIs) plays an important role in software evolution. However, the large diversity of UI technologies and the need for UIs to be resizable make this challenging. We propose ReverseORC, a novel RE approach able to discover diverse layout types and their dynamic resizing behaviours independently of their implementation, and to specify them by using OR constraints. Unlike previous RE approaches, ReverseORC infers flexible layout constraint specifications by sampling UIs at different sizes and analyzing the differences between them. It can create specifications that replicate even some non-standard layout managers with complex dynamic layout behaviours. We demonstrate that ReverseORC works across different platforms with very different layout approaches, e.g., for GUIs as well as for the Web. Furthermore, it can be used to detect and fix problems in legacy UIs, extend UIs with enhanced layout behaviours, and support the creation of flexible UI layouts.

Max Planck Institute for Informatics, Saarbrücken, Germany

Simon Fraser University, Vancouver, British Columbia, Canada

University of Bath, Bath, United Kingdom

10.1145/3411764.3445043

https://doi.org/10.1145/3411764.3445765

This paper presents GestureMap, a visual analytics tool for gesture elicitation which directly visualises the space of gestures. Concretely, a Variational Autoencoder embeds gestures recorded as 3D skeletons on an interactive 2D map. GestureMap further integrates three computational capabilities to connect exploration to quantitative measures: Leveraging DTW Barycenter Averaging (DBA), we compute average gestures to 1) represent gesture groups at a glance; 2) compute a new consensus measure (variance around average gesture); and 3) cluster gestures with k-means. We evaluate GestureMap and its concepts with eight experts and an in-depth analysis of published data. Our findings show how GestureMap facilitates exploring large datasets and helps researchers to gain a visual understanding of elicited gesture spaces. It further opens new directions, such as comparing elicitations across studies. We discuss implications for elicitation studies and research, and opportunities to extend our approach to additional tasks in gesture elicitation.

University of Bayreuth, Bayreuth, Germany

10.1145/3411764.3445765

https://doi.org/10.1145/3411764.3445457

Due to advances in deep learning, gestures have become a more common tool for human-computer interaction. When implementing a large amount of training data, deep learning models show remarkable performance in gesture recognition. Since it is expensive and time consuming to collect gesture data from people, we are often confronted with a practicality issue when managing the quantity and quality of training data. It is a well-known fact that increasing training data variability can help to improve the generalization performance of machine learning models. Thus, we directly intervene in the collection of gesture data to increase human gesture variability by adding some words (called styling words) into the data collection instructions, e.g., giving the instruction "perform gesture #1 faster" as opposed to "perform gesture #1." Through an in-depth analysis of gesture features and video-based gesture recognition, we have confirmed the advantageous use of styling words in gesture training data collection.

Gwangju Institute of Science and Technology, Gwangju, Korea, Republic of

10.1145/3411764.3445457

https://doi.org/10.1145/3411764.3445784

This paper contributes the first large-scale dataset of 17,979 hand-drawn sketches of 21 UI element categories collected from 967 participants, including UI/UX designers, front-end developers, HCI, and CS grad students, from 10 different countries. We performed a perceptual study with this dataset and found out that UI/UX designers can recognize the UI element sketches with ~96% accuracy. To compare human performance against computational recognition methods, we trained the state-of-the-art DNN-based image classification models to recognize the UI elements sketches. This study revealed that the ResNet-152 model outperforms other classification networks and detects unknown UI element sketches with 91.77% accuracy (chance is 4.76%). We have open-sourced the entire dataset of UI element sketches to the community intending to pave the way for further research in utilizing AI to assist the conversion of lo-fi UI sketches to higher fidelities.

RWTH Aachen University, Aachen, NRW, Germany

RWTH Aachen University, Aachen, Germany

10.1145/3411764.3445784