Videos

会議の名前
CHI 2023
Stargazer: An Interactive Camera Robot for Capturing How-To Videos Based on Subtle Instructor Cues
要旨

Live and pre-recorded video tutorials are an effective means for teaching physical skills such as cooking or prototyping electronics. A dedicated cameraperson following an instructor’s activities can improve production quality. However, instructors who do not have access to a cameraperson’s help often have to work within the constraints of static cameras. We present Stargazer, a novel approach for assisting with tutorial content creation with a camera robot that autonomously tracks regions of interest based on instructor actions to capture dynamic shots. Instructors can adjust the camera behaviors of Stargazer with subtle cues, including gestures and speech, allowing them to fluidly integrate camera control commands into instructional activities. Our user study with six instructors, each teaching a distinct skill, showed that participants could create dynamic tutorial videos with a diverse range of subjects, camera framing, and camera angle combinations using Stargazer.

著者
Jiannan Li
University of Toronto, Toronto, Ontario, Canada
Mauricio Sousa
University of Toronto, Toronto, Ontario, Canada
Karthik Mahadevan
University of Toronto, Toronto, Ontario, Canada
Bryan Wang
University of Toronto, Toronto, Ontario, Canada
Paula Akemi. Aoyagui
University of Toronto, Toronto, Ontario, Canada
Nicole Yu
University of Toronto, Toronto, Ontario, Canada
Angela Yang
University of Toronto, Toronto, Ontario, Canada
Ravin Balakrishnan
University of Toronto, Toronto, Ontario, Canada
Anthony Tang
University of Toronto, Toronto, Ontario, Canada
Tovi Grossman
University of Toronto, Toronto, Ontario, Canada
論文URL

https://doi.org/10.1145/3544548.3580896

動画
Beyond Instructions: A Taxonomy of Information Types in How-to Videos
要旨

How-to videos are rich in information---they not only give instructions but also provide justifications or descriptions. People seek different information to meet their needs, and identifying different types of information present in the video can improve access to the desired knowledge. Thus, we present a taxonomy of information types in how-to videos. Through an iterative open coding of 4k sentences in 48 videos, 21 information types under 8 categories emerged. The taxonomy represents diverse information types that instructors provide beyond instructions. We first show how our taxonomy can serve as an analytical framework for video navigation systems. Then, we demonstrate through a user study (n=9) how type-based navigation helps participants locate the information they needed. Finally, we discuss how the taxonomy enables a wide range of video-related tasks, such as video authoring, viewing, and analysis. To allow researchers to build upon our taxonomy, we release a dataset of 120 videos containing 9.9k sentences labeled using the taxonomy.

著者
Saelyne Yang
School of Computing, KAIST, Daejeon, Korea, Republic of
Sangkyung Kwak
School of Computing, KAIST, Daejeon, Korea, Republic of
Juhoon Lee
School of Computing, KAIST, Daejeon, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of
論文URL

https://doi.org/10.1145/3544548.3581126

動画
AVscript: Accessible Video Editing with Audio-Visual Scripts
要旨

Sighted and blind and low vision (BLV) creators alike use videos to communicate with broad audiences. Yet, video editing remains inaccessible to BLV creators. Our formative study revealed that current video editing tools make it difficult to access the visual content, assess the visual quality, and efficiently navigate the timeline. We present AVscript, an accessible text-based video editor. AVscript enables users to edit their video using a script that embeds the video's visual content, visual errors (e.g., dark or blurred footage), and speech. Users can also efficiently navigate between scenes and visual errors or locate objects in the frame or spoken words of interest. A comparison study (N=12) showed that AVscript significantly lowered BLV creators' mental demands while increasing confidence and independence in video editing. We further demonstrate the potential of AVscript through an exploratory study (N=3) where BLV creators edited their own footage.

著者
Mina Huh
University of Texas, Austin, Austin, Texas, United States
Saelyne Yang
School of Computing, KAIST, Daejeon, Korea, Republic of
Yi-Hao Peng
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Xiang 'Anthony' Chen
UCLA, Los Angeles, California, United States
Young-Ho Kim
NAVER AI Lab, Seongnam, Gyeonggi, Korea, Republic of
Amy Pavel
University of Texas, Austin, Austin, Texas, United States
論文URL

https://doi.org/10.1145/3544548.3581494

動画
Surch: Enabling Structural Search and Comparison for Surgical Videos
要旨

Video is an effective medium for learning procedural knowledge, such as surgical techniques. However, learning procedural knowledge through videos remains difficult due to limited access to procedural structures of knowledge (e.g., compositions and ordering of steps) in a large-scale video dataset. We present Surch, a system that enables structural search and comparison of surgical procedures. Surch supports video search based on procedural graphs generated by our clustering workflow capturing latent patterns within surgical procedures. We used vectorization and weighting schemes that characterize the features of procedures, such as recursive structures and unique paths. Surch enhances cross-video comparison by providing video navigation synchronized by surgical steps. Evaluation of the workflow demonstrates the effectiveness and interpretability (Silhouette score = 0.82) of our clustering for surgical learning. A user study with 11 residents shows that our system significantly improves the learning experience and task efficiency of video search and comparison, especially benefiting junior residents.

著者
Jeongyeon Kim
University of California, San Diego, San Diego, California, United States
DaEun Choi
KAIST, Daejeon, Korea, Republic of
Nicole Lee
KAIST, Daejeon, Korea, Republic of
Matt Beane
University of California, Santa Barbara, Santa Barbara, California, United States
Juho Kim
KAIST, Daejeon, Korea, Republic of
論文URL

https://doi.org/10.1145/3544548.3580772

動画
Colaroid: A Literate Programming Approach for Authoring Explorable Multi-Stage Tutorials
要旨

Multi-stage programming tutorials are key learning resources for programmers, using progressive incremental steps to teach them how to build larger software systems. A good multi-stage tutorial describes the code clearly, explains the rationale and code changes for each step, and allows readers to experiment as they work through the tutorial. In practice, it is time-consuming for authors to create tutorials with these attributes. In this paper, we introduce Colaroid, an interactive authoring tool for creating high quality multi-stage tutorials. Colaroid tutorials are augmented computational notebooks, where snippets and outputs represent a snapshot of a project, with source code differences highlighted, complete source code context for each snippet, and the ability to load and tinker with any stage of the project in a linked IDE. In two laboratory studies, we found Colaroid makes it easy to create multi-stage tutorials, while offering advantages to readers compared to video and web-based tutorials.

受賞
Honorable Mention
著者
April Yi. Wang
University of Michigan, Ann Arbor, Michigan, United States
Andrew Head
University of Pennsylvania, Philadelphia, Pennsylvania, United States
Ashley Ge. Zhang
University of Michigan, Ann Arbor, Ann Arbor, Michigan, United States
Steve Oney
University of Michigan, Ann Arbor, Michigan, United States
Christopher Brooks
University of Michigan, Ann Arbor, Michigan, United States
論文URL

https://doi.org/10.1145/3544548.3581525

動画
Escapement: A Tool for Interactive Prototyping with Video via Sensor-Mediated Abstraction of Time
要旨

We present Escapement, a video prototyping tool that introduces a powerful new concept for prototyping screen-based interfaces by flexibly mapping sensor values to dynamic playback control of videos. This recasts the time dimension of video mock-ups as sensor-mediated interaction. This abstraction of time as interaction, which we dub video-escapement prototyping, empowers designers to rapidly explore and viscerally experience direct touch or sensor-mediated interactions across one or more device displays. Our system affords cross-device and bidirectional remote (tele-present) experiences via cloud-based state sharing across multiple devices. This makes Escapement especially potent for exploring multi-device, dual-screen, or remote-work interactions for screen-based applications. We introduce the core concept of sensor-mediated abstraction of time for quickly generating video-based interactive prototypes of screen-based applications, share the results of observations of long-term usage of video-escapement techniques with experienced interaction designers, and articulate design choices for supporting a reflective, iterative, and open-ended creative design process.

著者
Molly Jane Nicholas
UC Berkeley, Berkeley, California, United States
Nicolai Marquardt
University College London, London, United Kingdom
Michel Pahud
Microsoft Research, Redmond, Washington, United States
Nathalie Riche
Microsoft Research, Redmond, Washington, United States
Hugo Romat
Microsoft, Seattle, Washington, United States
Christopher Collins
Ontario Tech University, Oshawa, Ontario, Canada
David Ledo
Autodesk Research, Toronto, Ontario, Canada
Rohan Kadekodi
University of Texas, Austin, Austin, Texas, United States
Badrish Chandramouli
Microsoft Research, Redmond, Washington, United States
Ken Hinckley
Microsoft Research, Redmond, Washington, United States
論文URL

https://doi.org/10.1145/3544548.3581115

動画