Tools for data scientists and Literature Reviews

https://doi.org/10.1145/3544548.3580741

Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DeepLens that has no interaction or visualization support.

University of Alberta, Edmonton, Alberta, Canada

Purdue University, West Lafayette, Indiana, United States

https://doi.org/10.1145/3544548.3581290

Machine learning practitioners often end up tunneling on low-level technical details like model architectures and performance metrics. Could early model development instead focus on high-level questions of which factors a model ought to pay attention to? Inspired by the practice of sketching in design, which distills ideas to their minimal representation, we introduce model sketching: a technical framework for iteratively and rapidly authoring functional approximations of a machine learning model's decision-making logic. Model sketching refocuses practitioner attention on composing high-level, human-understandable concepts that the model is expected to reason over (e.g., profanity, racism, or sarcasm in a content moderation task) using zero-shot concept instantiation. In an evaluation with 17 ML practitioners, model sketching reframed thinking from implementation to higher-level exploration, prompted iteration on a broader range of model designs, and helped identify gaps in the problem formulation—all in a fraction of the time ordinarily required to build a model.

Stanford University, Stanford, California, United States

Stanford University, San Francisco, California, United States

Stanford University, Stanford, California, United States

Northeastern University, Boston, Massachusetts, United States

Stanford University, Stanford, California, United States

https://doi.org/10.1145/3544548.3581371

In order to help scholars understand and follow a research topic, significant research has been devoted to creating systems that help scholars discover relevant papers and authors. Recent approaches have shown the usefulness of highlighting relevant authors while scholars engage in paper discovery. However, these systems do not capture and utilize users’ evolving knowledge of authors. We reflect on the design space and introduce ComLittee, a literature discovery system that supports author-centric exploration. In contrast to paper-centric interaction in prior systems, ComLittee’s author-centric interaction supports curating research threads from individual authors, finding new authors and papers using combined signals from a paper recommender and the curated authors’ authorship graphs, and understanding them in the context of those signals. In a within-subjects experiment that compares to a paper-centric discovery system with author-highlighting, we demonstrate how ComLittee improves author and paper discovery.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Massachusetts Institute of Technology, Cambridge, Massachusetts, United States

Allen Institute for AI, Seattle, Washington, United States

Semantic Scholar, Seattle, Washington, United States

Allen Institute for Artificial Intelligence, Seattle, Washington, United States

https://doi.org/10.1145/3544548.3580841

Scholars who want to research a scientific topic must take time to read, extract meaning, and identify connections across many papers. As scientific literature grows, this becomes increasingly challenging. Meanwhile, authors summarize prior research in papers’ related work sections, though this is scoped to support a single paper. A formative study found that while reading multiple related work paragraphs helps overview a topic, it is hard to navigate overlapping and diverging references and research foci. In this work, we design a system, Relatedly, that scaffolds exploring and reading multiple related work paragraphs on a topic, with features including dynamic re-ranking and highlighting to spotlight unexplored dissimilar information, auto-generated descriptive paragraph headings, and low-lighting of redundant information. From a within-subjects user study (n=15), we found that scholars generate more coherent, insightful, and comprehensive topic outlines using Relatedly compared to a baseline paper list.

University of California, San Diego, California, United States

Allen Institute for Artificial Intelligence, Seattle, Washington, United States

University of Washington, Seattle, Washington, United States

Allen Institute for Artificial Intelligence, Seattle, Washington, United States

Semantic Scholar, Seattle, Washington, United States