Tools for data scientists and Literature Reviews

会議の名前
CHI 2023
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
要旨

When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews.

受賞
Best Paper
著者
Joseph Chee Chang
Allen Institute for AI, Seattle, Washington, United States
Amy X.. Zhang
University of Washington, Seattle, Washington, United States
Jonathan Bragg
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Andrew Head
University of Pennsylvania, Philadelphia, Pennsylvania, United States
Kyle Lo
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Doug Downey
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Daniel S. Weld
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3580847

動画
DeepLens: Interactive Out-of-distribution Data Detection in NLP Models
要旨

Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DeepLens that has no interaction or visualization support.

著者
Da Song
University of Alberta, Edmonton, Alberta, Canada
Zhijie Wang
University of Alberta, Edmonton, Alberta, Canada
Yuheng Huang
University of Alberta, Edmonton, Alberta, Canada
Lei Ma
University of Alberta, Edmonton, Alberta, Canada
Tianyi Zhang
Purdue University, West Lafayette, Indiana, United States
論文URL

https://doi.org/10.1145/3544548.3580741

動画
Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design
要旨

Machine learning practitioners often end up tunneling on low-level technical details like model architectures and performance metrics. Could early model development instead focus on high-level questions of which factors a model ought to pay attention to? Inspired by the practice of sketching in design, which distills ideas to their minimal representation, we introduce model sketching: a technical framework for iteratively and rapidly authoring functional approximations of a machine learning model's decision-making logic. Model sketching refocuses practitioner attention on composing high-level, human-understandable concepts that the model is expected to reason over (e.g., profanity, racism, or sarcasm in a content moderation task) using zero-shot concept instantiation. In an evaluation with 17 ML practitioners, model sketching reframed thinking from implementation to higher-level exploration, prompted iteration on a broader range of model designs, and helped identify gaps in the problem formulation—all in a fraction of the time ordinarily required to build a model.

著者
Michelle S.. Lam
Stanford University, Stanford, California, United States
Zixian Ma
Stanford University, San Francisco, California, United States
Anne Li
Stanford University, Stanford, California, United States
Izequiel Freitas
Stanford University, Stanford, California, United States
Dakuo Wang
Northeastern University, Boston, Massachusetts, United States
James A.. Landay
Stanford University, Stanford, California, United States
Michael S.. Bernstein
Stanford University, Stanford, California, United States
論文URL

https://doi.org/10.1145/3544548.3581290

動画
ComLittee: Literature Discovery with Personal Elected Author Committees
要旨

In order to help scholars understand and follow a research topic, significant research has been devoted to creating systems that help scholars discover relevant papers and authors. Recent approaches have shown the usefulness of highlighting relevant authors while scholars engage in paper discovery. However, these systems do not capture and utilize users’ evolving knowledge of authors. We reflect on the design space and introduce ComLittee, a literature discovery system that supports author-centric exploration. In contrast to paper-centric interaction in prior systems, ComLittee’s author-centric interaction supports curating research threads from individual authors, finding new authors and papers using combined signals from a paper recommender and the curated authors’ authorship graphs, and understanding them in the context of those signals. In a within-subjects experiment that compares to a paper-centric discovery system with author-highlighting, we demonstrate how ComLittee improves author and paper discovery.

著者
Hyeonsu B. Kang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Nouran Soliman
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Matt Latzke
Allen Institute for AI, Seattle, Washington, United States
Joseph Chee Chang
Semantic Scholar, Seattle, Washington, United States
Jonathan Bragg
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3581371

動画
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
要旨

Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers’ related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work paragraphs helps overview a topic, it is hard to navigate overlapping and diverging references and research foci. In this work, we design a system, Relatedly, that scaffolds exploring and reading multiple related work paragraphs on a topic, with features including dynamic re-ranking and highlighting to spotlight unexplored dissimilar information, auto-generated descriptive paragraph headings, and low-lighting of redundant information. From a within-subjects user study (n=15), we found that scholars generate more coherent, insightful, and comprehensive topic outlines using Relatedly compared to a baseline paper list.

著者
Srishti Palani
University of California, San Diego, California, United States
Aakanksha Naik
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Doug Downey
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Amy X.. Zhang
University of Washington, Seattle, Washington, United States
Jonathan Bragg
Allen Institute for Artificial Intelligence, Seattle, Washington, United States
Joseph Chee Chang
Semantic Scholar, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3580841

動画
DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction
要旨

Recurrent Neural Networks (RNNs) have been widely used in Natural Language Processing (NLP) tasks given its superior performance on processing sequential data. However, it is challenging to interpret and debug RNNs due to the inherent complexity and the lack of transparency of RNNs. While many explainable AI (XAI) techniques have been proposed for RNNs, most of them only support local explanations rather than global explanations. In this paper, we present DeepSeer, an interactive system that provides both global and local explanations of RNN behavior in multiple tightly-coordinated views for model understanding and debugging. The core of DeepSeer is a state abstraction method that bundles semantically similar hidden states in an RNN model and abstracts the model as a finite state machine. Users can explore the global model behavior by inspecting text patterns associated with each state and the transitions between states. Users can also dive into individual predictions by inspecting the state trace and intermediate prediction results of a given input. A between-subjects user study with 28 participants shows that, compared with a popular XAI technique, LIME, participants using DeepSeer made a deeper and more comprehensive assessment of RNN model behavior, identified the root causes of incorrect predictions more accurately, and came up with more actionable plans to improve the model performance.

著者
Zhijie Wang
University of Alberta, Edmonton, Alberta, Canada
Yuheng Huang
University of Alberta, Edmonton, Alberta, Canada
Da Song
University of Alberta, Edmonton, Alberta, Canada
Lei Ma
University of Alberta, Edmonton, Alberta, Canada
Tianyi Zhang
Purdue University, West Lafayette, Indiana, United States
論文URL

https://doi.org/10.1145/3544548.3580852

動画