Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science

要旨

The work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows, there has been little research on data preparation in data journalism. We address this gap with a hybrid form of thematic analysis that combines deductive codes derived from existing accounts of data science workflows and inductive codes arising from an interview study with 36 professional data journalists. We extend a previous model of data science work to incorporate detailed activities of data preparation. We synthesize 60 dirty data issues from 16 taxonomies on dirty data and our interview data, and we provide a novel taxonomy to characterize these dirty data issues as discrepancies between mental models. We also identify four challenges faced by journalists: diachronic, regional, fragmented, and disparate data sources.

受賞
Honorable Mention
著者
Stephen Kasica
University of British Columbia, Vancouver, British Columbia, Canada
Charles Berret
Linköping University, Linköping, Sweden
Tamara Munzner
University of British Columbia, Vancouver, British Columbia, Canada
論文URL

https://doi.org/10.1145/3544548.3581271

動画

会議: CHI 2023

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)

セッション: Working with Data

Hall F
6 件の発表
2023-04-25 23:30:00
2023-04-26 00:55:00