Working with Data

会議の名前
CHI 2023
Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
要旨

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." However, little is known about how sensemaking behavior evolves from exploration and explanation during this process. This gap limits our ability to understand the full scope of sensemaking, which in turn inhibits the design of tools that support the process. We contribute the first mixed-method to characterize how sensemaking evolves within computational notebooks. We study 2,574 Jupyter notebooks mined from GitHub by identifying data science notebooks that have undergone significant iterations, presenting a regression model that automatically characterizes sensemaking activity, and using this regression model to calculate and analyze shifts in activity across GitHub versions. Our results show that notebook authors participate in various sensemaking tasks over time, such as annotation, branching analysis, and documentation. We use our insights to recommend extensions to current notebook environments.

受賞
Honorable Mention
著者
Deepthi Raghunandan
University of Maryland, College Park, Maryland, United States
Aayushi Roy
University of Maryland, College Park, Maryland, United States
Shenzhi Shi
University of Maryland, College Park, Maryland, United States
Niklas Elmqvist
University of Maryland, College Park, College Park, Maryland, United States
Leilani Battle
University of Washington, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3580997

動画
Using Logs Data to Identify When Engineers Experience Flow or Focused Work
要旨

Beyond self-report data, we lack reliable and non-intrusive methods for identifying flow. However, taking a step back and acknowledging that flow occurs during periods of focus gives us the opportunity to make progress towards measuring flow by isolating focused work. Here, we take a mixed-methods approach to design a logs-based metric that leverages machine learning and a comprehensive collection of logs data to identify periods of related actions (indicating focus), and validate this metric against self-reported time in focus or flow using diary data and quarterly survey data. Our results indicate that we can determine when software engineers at a large technology company experience focused work which includes instances of flow. This metric speaks to engineering work, but can be leveraged in other domains to non-disruptively measure when people experience focus. Future research can build upon this work to identify signals associated with other facets of flow.

著者
Adam Brown
Google, New York, New York, United States
Sarah D'Angelo
Google, Seattle, Washington, United States
Ben Holtz
Google, Toronto, Ontario, Canada
Ciera Jaspan
Google, Mountain View, California, United States
Collin Green
Google, Mountain View, California, United States
論文URL

https://doi.org/10.1145/3544548.3581562

動画
Different Researchers, Different Results? Analyzing the Influence of Researcher Experience and Data Type During Qualitative Analysis of an Interview and Survey Study on Security Advice
要旨

When conducting qualitative research it is necessary to decide how many researchers should be involved in coding the data: Is one enough or are more coders beneficial? To offer empirical evidence for this question, we designed a series of studies investigating qualitative coding. We replicated and extended a usable security and privacy study by Ion et al. to gather both simple survey data and complex interview data. We had a total of 65 students and seven researchers analyze different parts of this data. We analyzed the codebook creation process, similarity of outcomes, inter-rater reliability, and compared the student to the researcher outcomes. We also surveyed five years of SOUPS-PC members about their views on coding. The reviewers view on coding practices for complex and simple data are almost identical. However, our results suggest that the coding process can be different for the two types of data, with complex data benefiting more from interaction between coders.

著者
Anna-Marie Ortloff
University of Bonn, Bonn, Germany
Matthias Fassl
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Alexander Ponticello
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Florin Martius
University of Bonn, Bonn, Germany
Anne Mertens
University of Bonn, Bonn, Germany
Katharina Krombholz
Saarland Informatics Campus, Saarbrücken, Germany
Matthew Smith
University of Bonn, Bonn, Germany
論文URL

https://doi.org/10.1145/3544548.3580766

動画
Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and Organization
要旨

This paper examines the practices involved in mobilizing social media data from their site of production to the institutional context of non-profit organizations. We report on nine months of fieldwork with a transnational and intergovernmental organization using social media data to understand the role of grassroots initiatives in Mexico, in the unique context of the COVID-19 pandemic. We show how different stakeholders negotiate the definition of problems to be addressed with social media data, the collective creation of ground-truth, and the limitations involved in the process of extracting value from data. The meanings of social media data are not defined in advance; instead, they are contingent on the practices and needs of the organization that seeks to extract insights from the analysis. We conclude with a list of reflections and questions for researchers who mediate in the mobilization of social media data into non-profit organizations to inform humanitarian action.

著者
Adriana Alvarado Garcia
IBM Research, Yorktown Heights, New York, United States
Marisol Wong-Villacres
Escuela Superior Politécnica del Litoral, Guayaquil, Ecuador
Milagros Miceli
Technische Universität Berlin, Berlin, Germany
Benjamín Hernández
Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States
Christopher A. Le Dantec
Georgia Institute of Technology, Atlatna, Georgia, United States
論文URL

https://doi.org/10.1145/3544548.3580916

動画
A Need-Finding Study with Users of Geospatial Data
要旨

Geospatial data is playing an increasingly critical role in the work of Earth and climate scientists, social scientists, and data journalists exploring spatiotemporal change in our environment and societies. However, existing software and programming tools for geospatial analysis and visualization are challenging to learn and difficult to use. The aim of this work is to identify the unmet computing needs of the diverse and expanding community of geospatial data users. We conducted a contextual inquiry study (n = 25) with domain experts using geospatial data in their current work. Through a thematic analysis, we found that participants struggled to (1) find and transform geospatial data to satisfy spatiotemporal constraints, (2) understand the behavior of geospatial operators, (3) track geospatial data provenance, and (4) explore the cartographic design space. These findings suggest design opportunities for developers and designers of geospatial analysis and visualization systems.

著者
Parker Ziegler
University of California, Berkeley, Berkeley, California, United States
Sarah E.. Chasins
University of California, Berkeley, Berkeley, California, United States
論文URL

https://doi.org/10.1145/3544548.3581370

動画
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
要旨

The work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows, there has been little research on data preparation in data journalism. We address this gap with a hybrid form of thematic analysis that combines deductive codes derived from existing accounts of data science workflows and inductive codes arising from an interview study with 36 professional data journalists. We extend a previous model of data science work to incorporate detailed activities of data preparation. We synthesize 60 dirty data issues from 16 taxonomies on dirty data and our interview data, and we provide a novel taxonomy to characterize these dirty data issues as discrepancies between mental models. We also identify four challenges faced by journalists: diachronic, regional, fragmented, and disparate data sources.

受賞
Honorable Mention
著者
Stephen Kasica
University of British Columbia, Vancouver, British Columbia, Canada
Charles Berret
Linköping University, Linköping, Sweden
Tamara Munzner
University of British Columbia, Vancouver, British Columbia, Canada
論文URL

https://doi.org/10.1145/3544548.3581271

動画