Live and pre-recorded video tutorials are an effective means for teaching physical skills such as cooking or prototyping electronics. A dedicated cameraperson following an instructor’s activities can improve production quality. However, instructors who do not have access to a cameraperson’s help often have to work within the constraints of static cameras. We present Stargazer, a novel approach for assisting with tutorial content creation with a camera robot that autonomously tracks regions of interest based on instructor actions to capture dynamic shots. Instructors can adjust the camera behaviors of Stargazer with subtle cues, including gestures and speech, allowing them to fluidly integrate camera control commands into instructional activities. Our user study with six instructors, each teaching a distinct skill, showed that participants could create dynamic tutorial videos with a diverse range of subjects, camera framing, and camera angle combinations using Stargazer.
https://doi.org/10.1145/3544548.3580896
How-to videos are rich in information---they not only give instructions but also provide justifications or descriptions. People seek different information to meet their needs, and identifying different types of information present in the video can improve access to the desired knowledge. Thus, we present a taxonomy of information types in how-to videos. Through an iterative open coding of 4k sentences in 48 videos, 21 information types under 8 categories emerged. The taxonomy represents diverse information types that instructors provide beyond instructions. We first show how our taxonomy can serve as an analytical framework for video navigation systems. Then, we demonstrate through a user study (n=9) how type-based navigation helps participants locate the information they needed. Finally, we discuss how the taxonomy enables a wide range of video-related tasks, such as video authoring, viewing, and analysis. To allow researchers to build upon our taxonomy, we release a dataset of 120 videos containing 9.9k sentences labeled using the taxonomy.
https://doi.org/10.1145/3544548.3581126
Sighted and blind and low vision (BLV) creators alike use videos to communicate with broad audiences. Yet, video editing remains inaccessible to BLV creators. Our formative study revealed that current video editing tools make it difficult to access the visual content, assess the visual quality, and efficiently navigate the timeline. We present AVscript, an accessible text-based video editor. AVscript enables users to edit their video using a script that embeds the video's visual content, visual errors (e.g., dark or blurred footage), and speech. Users can also efficiently navigate between scenes and visual errors or locate objects in the frame or spoken words of interest. A comparison study (N=12) showed that AVscript significantly lowered BLV creators' mental demands while increasing confidence and independence in video editing. We further demonstrate the potential of AVscript through an exploratory study (N=3) where BLV creators edited their own footage.
https://doi.org/10.1145/3544548.3581494
Video is an effective medium for learning procedural knowledge, such as surgical techniques. However, learning procedural knowledge through videos remains difficult due to limited access to procedural structures of knowledge (e.g., compositions and ordering of steps) in a large-scale video dataset. We present Surch, a system that enables structural search and comparison of surgical procedures. Surch supports video search based on procedural graphs generated by our clustering workflow capturing latent patterns within surgical procedures. We used vectorization and weighting schemes that characterize the features of procedures, such as recursive structures and unique paths. Surch enhances cross-video comparison by providing video navigation synchronized by surgical steps. Evaluation of the workflow demonstrates the effectiveness and interpretability (Silhouette score = 0.82) of our clustering for surgical learning. A user study with 11 residents shows that our system significantly improves the learning experience and task efficiency of video search and comparison, especially benefiting junior residents.
https://doi.org/10.1145/3544548.3580772
Multi-stage programming tutorials are key learning resources for programmers, using progressive incremental steps to teach them how to build larger software systems. A good multi-stage tutorial describes the code clearly, explains the rationale and code changes for each step, and allows readers to experiment as they work through the tutorial. In practice, it is time-consuming for authors to create tutorials with these attributes. In this paper, we introduce Colaroid, an interactive authoring tool for creating high quality multi-stage tutorials. Colaroid tutorials are augmented computational notebooks, where snippets and outputs represent a snapshot of a project, with source code differences highlighted, complete source code context for each snippet, and the ability to load and tinker with any stage of the project in a linked IDE. In two laboratory studies, we found Colaroid makes it easy to create multi-stage tutorials, while offering advantages to readers compared to video and web-based tutorials.
https://doi.org/10.1145/3544548.3581525
We present Escapement, a video prototyping tool that introduces a powerful new concept for prototyping screen-based interfaces by flexibly mapping sensor values to dynamic playback control of videos. This recasts the time dimension of video mock-ups as sensor-mediated interaction. This abstraction of time as interaction, which we dub video-escapement prototyping, empowers designers to rapidly explore and viscerally experience direct touch or sensor-mediated interactions across one or more device displays. Our system affords cross-device and bidirectional remote (tele-present) experiences via cloud-based state sharing across multiple devices. This makes Escapement especially potent for exploring multi-device, dual-screen, or remote-work interactions for screen-based applications. We introduce the core concept of sensor-mediated abstraction of time for quickly generating video-based interactive prototypes of screen-based applications, share the results of observations of long-term usage of video-escapement techniques with experienced interaction designers, and articulate design choices for supporting a reflective, iterative, and open-ended creative design process.
https://doi.org/10.1145/3544548.3581115