Consumer speech recognition systems do not work as well for many people with speech differences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we first address these questions using results from a 61-person survey from people who stutter and find participants want to use speech recognition but are frequently cut off, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dic- tation, we quantify how dysfluencies impede performance in a consumer-grade speech recognition system. Through three techni- cal investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances off 79.1% less often and improves word error rate from 25.4% to 9.9%.
Design activities, such as brainstorming or critique, often take place in open spaces combining whiteboards and tables to present artefacts. In co-located settings, peripheral awareness enables participants to understand each other’s locus of attention with ease. However, these spatial cues are mostly lost while using videoconferencing tools. Telepresence robots could bring back a sense of presence, but controlling them is distracting. To address this problem, we present ReMotion, a fully automatic robotic proxy designed to explore a new way of supporting non-collocated open-space design activities. ReMotion combines a commodity body tracker (Kinect) to capture a user’s location and orientation over a wide area with a minimally invasive wearable system (NeckFace) to capture facial expressions. Due to its omnidirectional platform, ReMotion embodiment can render a wide range of body movements. A formative evaluation indicated that our system enhances the sharing of attention and the sense of co-presence enabling seamless movement-in-space during a design review task.
Data scientists often have to use other presentation tools (e.g., Microsoft PowerPoint) to create slides to communicate their analysis obtained using computational notebooks. Much tedious and repetitive work is needed to transfer the routines of notebooks (e.g., code, plots) to the presentable contents on slides (e.g., bullet points, figures). We propose a human-AI collaborative approach and operationalize it within Slide4N, an interactive AI assistant for data scientists to create slides from computational notebooks. Slide4N leverages advanced natural language processing techniques to distill key information from user-selected notebook cells and then renders them in appropriate slide layouts. The tool also provides intuitive interactions that allow further refinement and customization of the generated slides. We evaluated Slide4N with a two-part user study, where participants appreciated this human-AI collaborative approach compared to fully-manual or fully-automatic methods. The results also indicate the usefulness and effectiveness of Slide4N in slide creation tasks from notebooks.
In video meetings, individuals may wish to share various physical objects with remote participants, such as physical documents, design prototypes, and personal belongings. However, our formative study discovered that this poses several challenges, including difficulties in referencing a remote user's physical objects, the limited visibility of the object, and the friction of properly framing and orienting an object to the camera. To address these challenges, we propose ThingShare, a video-conferencing system designed to facilitate the sharing of physical objects during remote meetings. With ThingShare, users can quickly create digital copies of physical objects in the video feeds, which can then be magnified on a separate panel for focused viewing, overlaid on the user’s video feed for sharing in context, and stored in the object drawer for reviews. Our user study demonstrated that ThingShare made initiating object-centric conversations more efficient and provided a more stable and comprehensive view of shared objects.
Over the years, the task of AI-assisted data annotation has seen remarkable advancements. However, a specific type of annotation task, the qualitative coding performed during thematic analysis, has characteristics that make effective human-AI collaboration difficult. Informed by a formative study, we designed PaTAT, a new AI-enabled tool that uses an interactive program synthesis approach to learn flexible and expressive patterns over user-annotated codes in real-time as users annotate data. To accommodate the ambiguous, uncertain, and iterative nature of thematic analysis, the use of user-interpretable patterns allows users to understand and validate what the system has learned, make direct fixes, and easily revise, split, or merge previously annotated codes. This new approach also helps human users to learn data characteristics and form new theories in addition to facilitating the ``learning'' of the AI model. PaTAT’s usefulness and effectiveness were evaluated in a lab user study.
Editing (e.g., editing conceptual diagrams) is a typical office task that requires numerous tedious GUI operations, resulting in poor interaction efficiency and user experience, especially on mobile devices. In this paper, we present a new type of human-computer collaborative editing tool (CET) that enables accurate and efficient editing with little interaction effort. CET divides the task into two parts, and the human and the computer focus on their respective specialties: the human describes high-level editing goals with multimodal commands, while the computer calculates, recommends, and performs detailed operations. We conducted a formative study (N = 16) to determine the concrete task division and implemented the tool on Android devices for the specific tasks of editing concept diagrams. The user study (N = 24 + 20) showed that it increased diagram editing speed by 32.75% compared with existing state-of-the-art commercial tools and led to better editing results and user experience.