Interacting with Data

会議の名前
CHI 2022
OneLabeler: A Flexible System for Building Data Labeling Tools
要旨

Labeled datasets are essential for supervised machine learning. Various data labeling tools have been built to collect labels in different usage scenarios. However, developing labeling tools is time-consuming, costly, and expertise-demanding on software development. In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios. The framework consists of common modules and states in labeling tools summarized through coding of existing tools. OneLabeler supports configuration and composition of common software modules through visual programming to build data labeling tools. A module can be a human, machine, or mixed computation procedure in data labeling. We demonstrate the expressiveness and utility of the system through ten example labeling tools built with OneLabeler. A user study with developers provides evidence that OneLabeler supports efficient building of diverse data labeling tools.

著者
Yu Zhang
University of Oxford, Oxford, United Kingdom
Yun Wang
Microsoft Research Asia, Beijing, China
Haidong Zhang
Microsoft Research Asia, Beijing, China
Bin Zhu
Microsoft Research Asia, Beijing, China
Siming Chen
Fudan University, Shanghai, China
Dongmei Zhang
Microsoft Research Asia, Beijing, China
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3517612

動画
graphiti: Sketch-based Graph Analytics for Images and Videos
要旨

Graph analytics is currently performed using a combination of code, symbolic algebra, and network visualizations. The analyst has to work with symbolic and abstract forms of data to construct and analyze graphs. We locate unique design opportunities at the intersection of computer vision and graph analytics, by utilizing visual variables extracted from images/videos and some direct manipulation and pen interaction techniques. We also summarize commonly used graph operations and graphical representations (graphs, simplicial complexes, hypergraphs), and map them to a few brushes and direct manipulation actions. The mapping enables us to visually construct and analyze a wide range of graphs on top of images, videos, and sketches. The design framework is implemented as a sketch-based notebook interface to demonstrate the design possibilities. User studies with scientists from various fields reveal innovative use cases for such an embodied interaction paradigm for graph analytics.

著者
Nazmus Saquib
Tero Labs, Sunnyvale, California, United States
Faria Huq
Tero Labs, Sunnyvale, California, United States
Syed Arefinul Haque
Northeastern University, Boston, Massachusetts, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3501923

動画
CrossData: Leveraging Text-Data Connections for Authoring Data Documents
要旨

Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.

著者
Zhutian Chen
University of California San Diego, San Diego, California, United States
Haijun Xia
University of California, San Diego, San Diego, California, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3517485

動画
Promptiverse: Scalable Generation of Scaffolding Prompts Through Human-AI Hybrid Knowledge Graph Annotation
要旨

Online learners are hugely diverse with varying prior knowledge, but most instructional videos online are created to be one-size-fits-all. Thus, learners may struggle to understand the content by only watching the videos. Providing scaffolding prompts can help learners overcome these struggles through questions and hints that relate different concepts in the videos and elicit meaningful learning. However, serving diverse learners would require a spectrum of scaffolding prompts, which incurs high authoring effort. In this work, we introduce Promptiverse, an approach for generating diverse, multi-turn scaffolding prompts at scale, powered by numerous traversal paths over knowledge graphs. To facilitate the construction of the knowledge graphs, we propose a hybrid human-AI annotation tool, Grannotate. In our study (N=24), participants produced 40 times more on-par quality prompts with higher diversity, through Promptiverse and Grannotate, compared to hand-designed prompts. Promptiverse presents a model for creating diverse and adaptive learning experiences online.

著者
Yoonjoo Lee
KAIST, Daejeon, Korea, Republic of
John Joon Young. Chung
University of Michigan, Ann Arbor, Michigan, United States
Tae Soo Kim
KAIST, Daejeon, Korea, Republic of
Jean Y. Song
DGIST, Daegu, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502087

動画
Diff in the Loop: Supporting Data Comparison in Exploratory Data Analysis
要旨

Data science is characterized by evolution: since data science is exploratory, results evolve from moment to moment; since it can be collaborative, results evolve as the work changes hands. While existing tools help data scientists track changes in code, they provide less support for understanding the iterative changes that the code produces in the data. We explore the idea of visualizing differences in datasets as a core feature of exploratory data analysis, a concept we call Diff in the Loop (DITL). We evaluated DITL in a user study with 16 professional data scientists and found it helped them understand the implications of their actions when manipulating data. We summarize these findings and discuss how the approach can be generalized to different data science workflows.

著者
April Yi. Wang
University of Michigan, Ann Arbor, Michigan, United States
Will Epperson
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Robert A. DeLine
Microsoft Corp, Redmond, Washington, United States
Steven M.. Drucker
Microsoft Research, Redmond, Washington, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502123

動画