Engineering Interactive Applications

[B] Paper Room 05, 2021-05-14 01:00:00~2021-05-14 03:00:00 / [C] Paper Room 05, 2021-05-14 09:00:00~2021-05-14 11:00:00 / [A] Paper Room 05, 2021-05-13 17:00:00~2021-05-13 19:00:00

会議の名前
CHI 2021
danceON: Culturally Responsive Creative Computing
要旨

Dance provides unique opportunities for embodied interdisciplinary learning experiences that can be personally and culturally relevant. danceON is a system that supports learners to leverage their body movement as they engage in artistic practices across data science, computing, and dance. The technology includes a Domain Specific Language (DSL) with declarative syntax and reactive behavior, a media player with pose detection and classification, and a web-based IDE. danceON provides a low-floor allowing users to bind virtual shapes to body positions in under three lines of code, while also enabling complex, dynamic animations that users can design working with conditionals and past position data. We developed danceON to support distance learning and deployed it in two consecutive cohorts of a remote, two-week summer camp for young women of color. We present our findings from an analysis of the experience and the resulting computational performances. The work identifies implications for how design can support learners' expression across culturally relevant themes and examines challenges from the lens of usability of the computing language and technology.

受賞
Honorable Mention
著者
William Christopher. Payne
New York University, Brooklyn, New York, United States
Yoav Bergner
New York University, New York, New York, United States
Mary Etta. West
CU Bolser, Boulder, Colorado, United States
Carlie Charp
University of Colorado , Boulder , Colorado, United States
R. Benjamin Shapiro
University of Colorado Boulder, Boulder, Colorado, United States
Danielle Albers. Szafir
University of Colorado Boulder, Boulder, Colorado, United States
Edd V.. Taylor
University of Colorado Boulder, Boulder, Colorado, United States
Kayla DesPortes
New York University, New York, New York, United States
DOI

10.1145/3411764.3445149

論文URL

https://doi.org/10.1145/3411764.3445149

動画
RubySlippers: Supporting Content-based Voice Navigation for How-to Videos
要旨

Directly manipulating the timeline, such as scrubbing for thumbnails, is the standard way of controlling how-to videos. However, when how-to videos involve physical activities, people inconveniently alternate between controlling the video and performing the tasks. Adopting a voice user interface allows people to control the video with voice while performing the tasks with hands. However, naively translating timeline manipulation into voice user interfaces (VUI) results in temporal referencing (e.g. ``rewind 20 seconds''), which requires a different mental model for navigation and thereby limiting users' ability to peek into the content. We present RubySlippers, a system that supports efficient content-based voice navigation through keyword-based queries. Our computational pipeline automatically detects referenceable elements in the video, and finds the video segmentation that minimizes the number of needed navigational commands. Our evaluation (N=12) shows that participants could perform three representative navigation tasks with fewer commands and less frustration using RubySlippers than the conventional voice-enabled video interface.

著者
Minsuk Chang
KAIST, Daejeon, Korea, Republic of
Mina Huh
KAIST, Daejeon, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of
DOI

10.1145/3411764.3445131

論文URL

https://doi.org/10.1145/3411764.3445131

動画
Soloist: Generating Mixed-Initiative Tutorials from Existing Guitar Instructional Videos Through Audio Processing
要旨

Learning musical instruments using online instructional videos has become increasingly prevalent. However, pre-recorded videos lack the instantaneous feedback and personal tailoring that human tutors provide. In addition, existing video navigations are not optimized for instrument learning, making the learning experience encumbered. Guided by our formative interviews with guitar players and prior literature, we designed Soloist, a mixed-initiative learning framework that automatically generates customizable curriculums from off-the-shelf guitar video lessons. Soloist takes raw videos as input and leverages deep-learning based audio processing to extract musical information. This back-end processing is used to provide an interactive visualization to support effective video navigation and real-time feedback on the user’s performance, creating a guided learning experience. We demonstrate the capabilities and specific use-cases of Soloist within the domain of learning electric guitar solos using instructional YouTube videos. A remote user study, conducted to gather feedback from guitar players, shows encouraging results as the users unanimously preferred learning with Soloist over unconverted instructional videos.

著者
Bryan Wang
University of Toronto, Toronto, Ontario, Canada
Mengyu Yang
University of Toronto, Toronto, Ontario, Canada
Tovi Grossman
University of Toronto, Toronto, Ontario, Canada
DOI

10.1145/3411764.3445162

論文URL

https://doi.org/10.1145/3411764.3445162

動画
Mindless Attractor: A False-Positive Resistant Intervention for Drawing Attention Using Auditory Perturbation
要旨

Explicitly alerting users is not always an optimal intervention, especially when they are not motivated to obey. For example, in video-based learning, learners who are distracted from the video would not follow an alert asking them to pay attention. Inspired by the concept of Mindless Computing, we propose a novel intervention approach, Mindless Attractor, that leverages the nature of human speech communication to help learners refocus their attention without relying on their motivation. Specifically, it perturbs the voice in the video to direct their attention without consuming their conscious awareness. Our experiments not only confirmed the validity of the proposed approach but also emphasized its advantages in combination with a machine learning-based sensing module. Namely, it would not frustrate users even though the intervention is activated by false-positive detection of their attentive state. Our intervention approach can be a reliable way to induce behavioral change in human-AI symbiosis.

受賞
Honorable Mention
著者
Riku Arakawa
The University of Tokyo, Hongo, Japan
Hiromu Yakura
University of Tsukuba, Tsukuba, Japan
DOI

10.1145/3411764.3445339

論文URL

https://doi.org/10.1145/3411764.3445339

動画
KTabulator: Interactive Ad hoc Table Creation Using Knowledge Graphs
要旨

The need to find or construct tables arises routinely to accomplish many tasks in everyday life, as a table is a common format for organizing data. However, when relevant data is found on the web, it is often scattered across multiple tables on different web pages, requiring tedious manual searching and copy-pasting to collect data. We propose KTabulator, an interactive system to effectively extract, build, or extend ad hoc tables from large corpora, by leveraging their computerized structures in the form of knowledge graphs. We developed and evaluated KTabulator using Wikipedia and its knowledge graph DBpedia as our testbed. Starting from an entity or an existing table, KTabulator allows users to extend their tables by finding relevant entities, their properties, and other relevant tables, while providing meaningful suggestions and guidance. The results of a user study indicate the usefulness and efficiency of KTabulator in ad hoc table creation.

著者
Siyuan Xia
University of Waterloo, Waterloo, Ontario, Canada
Nafisa Anzum
University of Waterloo, Waterloo, Ontario, Canada
Semih Salihoglu
University of Waterloo, Waterloo, Ontario, Canada
Jian Zhao
University of Waterloo, Waterloo, Ontario, Canada
DOI

10.1145/3411764.3445227

論文URL

https://doi.org/10.1145/3411764.3445227

動画
MorpheesPlug: A Toolkit for Prototyping Shape-Changing Interfaces
要旨

Toolkits for shape-changing interfaces (SCIs) enable designers and researchers to easily explore the broad design space of SCIs. However, despite their utility, existing approaches are often limited in the number of shape-change features they can express. This paper introduces MorpheesPlug, a toolkit for creating SCIs that covers seven of the eleven shape-change features identified in the literature. MorpheesPlug is comprised of (1) a set of six standardized widgets that express the shape-change features with user-definable parameters; (2) software for 3D-modeling the widgets to create 3D-printable pneumatic SCIs; and (3) a hardware platform to control the widgets. To evaluate MorpheesPlug we carried out ten open-ended interviews with novice and expert designers who were asked to design a SCI using our software. Participants highlighted the ease of use and expressivity of the MorpheesPlug.

著者
Hyunyoung Kim
University of Copenhagen, Copenhagen, Denmark
Aluna Everitt
University of Bristol, Bristol, United Kingdom
Carlos Tejada
University of Copenhagen, Copenhagen, Denmark
Mengyu Zhong
University of Copenhagen, Copenhagen, Denmark
Daniel Ashbrook
University of Copenhagen, Copenhagen, Denmark
DOI

10.1145/3411764.3445786

論文URL

https://doi.org/10.1145/3411764.3445786

動画
Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational Agent
要旨

Embodied conversational agents have changed the ways we can interact with machines. However, these systems often do not meet users' expectations. A limitation is that the agents are monotonic in behavior and do not adapt to an interlocutor. We present SIVA (a Socially Intelligent Virtual Agent), an expressive, embodied conversational agent that can recognize human behavior during open-ended conversations and automatically align its responses to the conversational and expressive style of the other party. SIVA leverages multimodal inputs to produce rich and perceptually valid responses (lip syncing and facial expressions) during the conversation. We conducted a user study (N=30) in which participants rated SIVA as being more empathetic and believable than the control (agent without style matching). Based on almost 10 hours of interaction, participants who preferred interpersonal involvement evaluated SIVA as significantly more animate than the participants who valued consideration and independence.

著者
Deepali Aneja
Adobe Research, Seattle, Washington, United States
Rens Hoegen
Institute for Creative Technologies, Los Angeles, California, United States
Daniel McDuff
Microsoft, Seattle, Washington, United States
Mary Czerwinski
Microsoft Research, Redmond, Washington, United States
DOI

10.1145/3411764.3445708

論文URL

https://doi.org/10.1145/3411764.3445708

動画
DeepTake: Prediction of Driver Takeover Behavior using Multimodal Data
要旨

Automated vehicles promise a future where drivers can engage in non-driving tasks without hands on the steering wheels for a prolonged period. Nevertheless, automated vehicles may still need to occasionally hand the control back to drivers due to technology limitations and legal requirements. While some systems determine the need for driver takeover using driver context and road condition to initiate a takeover request, studies show that the driver may not react to it. We present DeepTake, a novel deep neural network-based framework that predicts multiple aspects of takeover behavior to ensure that the driver is able to safely take over the control when engaged in non-driving tasks. Using features from vehicle data, driver biometrics, and subjective measurements, DeepTake predicts the driver's intention, time, and quality of takeover. We evaluate DeepTake performance using multiple evaluation metrics. Results show that DeepTake reliably predicts the takeover intention, time, and quality, with an accuracy of 96%, 93%, and 83%, respectively. Results also indicate that DeepTake outperforms previous state-of-the-art methods on predicting driver takeover time and quality. Our findings have implications for the algorithm development of driver monitoring and state detection.

著者
Erfan Pakdamanian
University of Virginia, Charlottesville, Virginia, United States
Shili Sheng
University of Virginia, Charlottesville, Virginia, United States
Sonia Baee
University of Virginia, Charlottesville, Virginia, United States
Seongkook Heo
University of Virginia, Charlottesville, Virginia, United States
Sarit Kraus
Bar-Ilan Univ., Ramat-Gan, Israel
Lu Feng
University of Virginia, Charlottesville, Virginia, United States
DOI

10.1145/3411764.3445563

論文URL

https://doi.org/10.1145/3411764.3445563

動画
Integrating Machine Learning Data with Symbolic Knowledge from Collaboration Practices of Curators to Improve Conversational Systems
要旨

This paper describes how machine learning training data and symbolic knowledge from curators of conversational systems can be used together to improve the accuracy of those systems and to enable better curatorial tools. This is done in the context of a real-world practice of curators of conversational systems who often embed taxonomically-structured meta-knowledge into their documentation. The paper provides evidence that the practice is quite common among curators, that is used as part of their collaborative practices, and that the embedded knowledge can be mined by algorithms. Further, this meta-knowledge can be integrated, using neuro-symbolic algorithms, to the machine learning-based conversational system, to improve its run-time accuracy and to enable tools to support curatorial tasks. Those results point towards new ways of designing development tools which explore an integrated use of code and documentation by machines.

著者
Claudio Santos. Pinhanez
IBM Research Brazil, Sao Paulo, Brazil
Heloisa Candello
IBM Research, Sao Paulo, Brazil
Paulo Cavalin
IBM Research Brazil, Sao Paulo, Brazil
Mauro Carlos. Pichiliani
IBM Research Brazil, Sao Paulo, Brazil
Ana Paula Appel
IBM Research Brazil, Sao Paulo, Brazil
Victor Henrique Alves Ribeiro
IBM Research Brazil, Sao Paulo, Brazil
Julio Nogima
IBM Research Brazil, Sao Paulo, Brazil
Maira de Bayser
IBM Research Brazil, Sao Paulo, Brazil
Melina Guerra
IBM Research Brazil, Sao Paulo, Brazil
Henrique Ferreira
IBM Research Brazil, Sao Paulo, Brazil
Gabriel Malfatti
IBM Research Brazil, Sao Paulo, Brazil
DOI

10.1145/3411764.3445368

論文URL

https://doi.org/10.1145/3411764.3445368

動画
Interpretable Program Synthesis
要旨

Program synthesis, which generates programs based on user-provided specifications, can be obscure and brittle: users have few ways to understand and recover from synthesis failures. We propose interpretable program synthesis, a novel approach that unveils the synthesis process and enables users to monitor and guide the synthesis. We designed three representations that explain the underlying synthesis process with different levels of fidelity. We implemented an interpretable synthesizer and conducted a within-subjects study with eighteen participants on three challenging regular expression programming tasks. With interpretable synthesis, participants were able to reason about synthesis failures and strategically provide feedback, achieving a significantly higher success rate compared with a state-of-the-art synthesizer. In particular, participants with a high engagement tendency (as measured by NCS-6) preferred a deductive representation that shows the synthesis process in a search tree, while participants with a relatively low engagement tendency preferred an inductive representation that renders representative samples of programs enumerated during synthesis.

著者
Tianyi Zhang
Harvard University, Cambridge, Massachusetts, United States
Zhiyang Chen
University of Michigan, Ann Arbor, Michigan, United States
Yuanli Zhu
University of Michigan, Ann Arbor, Michigan, United States
Priyan Vaithilingam
Harvard University, Cambridge, Massachusetts, United States
Xinyu Wang
University of Michigan, Ann Arbor, Michigan, United States
Elena L.. Glassman
Harvard University, Cambridge, Massachusetts, United States
DOI

10.1145/3411764.3445646

論文URL

https://doi.org/10.1145/3411764.3445646

動画
Falx: Synthesis-Powered Visualization Authoring
要旨

Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by mapping data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spend significant effort on data transformation. We propose Falx, a synthesis-powered visualization tool that allows users to specify visualizations in a similarly simple way but without needing to worry about data layout. In Falx, users specify visualizations using examples of how concrete values in the input are mapped to visual channels, and Falx automatically infers the visualization specification and transforms the data to match the design. In a study with 33 data analysts on four visualization tasks involving data transformation, we found that users can effectively adopt Falx to create visualizations they otherwise cannot implement.

受賞
Best Paper
著者
Chenglong Wang
University of Washington, Seattle, Washington, United States
Yu Feng
University of California, Santa Barbara, Santa Barbara, California, United States
Rastislav Bodik
University of Washington, Seattle, Washington, United States
Isil Dillig
University of Texas, Austin, Austin, Texas, United States
Alvin Cheung
University of California, Berkeley, Berkeley, California, United States
Amy J. Ko
University of Washington, Seattle, Washington, United States
DOI

10.1145/3411764.3445249

論文URL

https://doi.org/10.1145/3411764.3445249

動画
XRStudio: A Virtual Production and Live Streaming System for Immersive Instructional Experiences
要旨

There is increased interest in using virtual reality in education, but it often remains an isolated experience that is difficult to integrate into current instructional experiences. In this work, we adapt virtual production techniques from filmmaking to enable mixed reality capture of instructors so that they appear to be standing directly in the virtual scene. We also capitalize on the growing popularity of live streaming software for video conferencing and live production. With XRStudio, we develop a pipeline for giving lectures in VR, enabling live compositing using a variety of presets and real-time output to traditional video and more immersive formats. We present interviews with media designers experienced in film and MOOC production that informed our design. Through walkthrough demonstrations of XRStudio with instructors experienced with VR, we learn how it could be used in a variety of domains. In end-to-end evaluations with students, we analyze and compare differences of traditional video vs. more immersive lectures with XRStudio.

著者
Michael Nebeling
University of Michigan, Ann Arbor, Michigan, United States
Shwetha Rajaram
University of Michigan, Ann Arbor, Michigan, United States
Liwei Wu
University of Michigan, Ann Arbor, Michigan, United States
Yifei Cheng
Swarthmore College, Swarthmore, Pennsylvania, United States, Swarthmore, Pennsylvania, United States
Jaylin Herskovitz
University of Michigan, Ann Arbor, Michigan, United States
DOI

10.1145/3411764.3445323

論文URL

https://doi.org/10.1145/3411764.3445323

動画
Automatic Generation of Two Level Hierarchical Tutorials from Instructional Makeup Videos
要旨

We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.

著者
Anh Truong
Stanford University, Stanford, California, United States
Peggy Chi
Google Research, Mountain View, California, United States
David Salesin
Google Research, San Francisco, California, United States
Irfan Essa
Google, Atlanta, Georgia, United States
Maneesh Agrawala
Stanford University, Stanford, California, United States
DOI

10.1145/3411764.3445721

論文URL

https://doi.org/10.1145/3411764.3445721

動画
Beyond Show of Hands: Engaging Viewers via Expressive and Scalable Visual Communication in Live Streaming
要旨

Live streaming is gaining popularity across diverse application domains in recent years. A core part of the experience is streamer-viewer interaction, which has been mainly text-based. Recent systems explored extending viewer interaction to include visual elements with richer expression and increased engagement. However, understanding expressive visual inputs becomes challenging with many viewers, primarily due to the relative lack of structure in visual input. On the other hand, adding rigid structures can limit viewer interactions to narrow use cases or decrease the expressiveness of viewer inputs. To facilitate the sensemaking of many visual inputs while retaining the expressiveness or versatility of viewer interactions, we introduce a visual input management framework(VIMF) and a system, VisPoll, that help streamers specify, aggregate, and visualize many visual inputs. A pilot evaluation indicated that VisPoll can expand the types of viewer interactions. Our framework provides insights for designing scalable and expressive visual communication for live streaming.

著者
John Joon Young. Chung
University of Michigan, Ann Arbor, Michigan, United States
Hijung Valentina Shin
Adobe Research, Cambridge, Massachusetts, United States
Haijun Xia
University of California, San Diego, San Diego, California, United States
Li-yi Wei
Adobe Research, San Jose, California, United States
Rubaiat Habib Kazi
Adobe Research, Seattle, Washington, United States
DOI

10.1145/3411764.3445419

論文URL

https://doi.org/10.1145/3411764.3445419

動画