Audio for Accessibility

会議の名前
CHI 2022
TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders Screening
要旨

Conversational agents (CAs) have the great potential in mitigating the clinicians' burden in screening for neurocognitive disorders among older adults. It is important, therefore, to develop CAs that can be engaging, to elicit conversational speech input from older adult participants for supporting assessment of cognitive abilities. As an initial step, this paper presents research in developing the backchanneling ability in CAs in the form of a verbal response to engage the speaker. We analyzed 246 conversations of cognitive assessments between older adults and human assessors, and derived the categories of reactive backchannels (e.g. ``hmm”) and proactive backchannels (e.g. ``please keep going”). This is used in the development of TalkTive, a CA which can predict both timing and form of backchanneling during cognitive assessments. The study then invited 36 older adult participants to evaluate the backchanneling feature. Results show that proactive backchanneling is more appreciated by participants than reactive backchanneling.

著者
Zijian Ding
University of Maryland, West Hyattsville, Maryland, United States
Jiawen Kang
Department of Systems Engineering and Engineering Management, Hong Kong, China
Tinky Oi Ting HO
The Chinese University of Hong Kong , Hong Kong , Hong Kong
Ka Ho Wong
The Chinese University of Hong Kong, Hong Kong, China
Helene H. Fung
Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
Helen Meng
CUHK, Shatin, Not in Regions Listed, Hong Kong
Xiaojuan Ma
Hong Kong University of Science and Technology, Hong Kong, Hong Kong
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502005

動画
ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users
要旨

Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements.

著者
Dhruv Jain
University of Washington, Seattle, Washington, United States
Khoa Huynh Anh. Nguyen
University of Washington, Seattle, Washington, United States
Steven M.. Goodman
University of Washington, Seattle, Washington, United States
Rachel Grossman-Kahn
University of Washington, Seattle, Washington, United States
Hung Ngo
University of Washington, Seattle, Washington, United States
Aditya Kusupati
University of Washington, Seattle, Washington, United States
Ruofei Du
Google, San Francisco, California, United States
Alex Olwal
Google Inc., Mountain View, California, United States
Leah Findlater
University of Washington, Seattle, Washington, United States
Jon E.. Froehlich
University of Washington, Seattle, Washington, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502020

動画
Analyzing Deaf and Hard-of-Hearing Users' Behavior, Usage, and Interaction with a Personal Assistant Device that Understands Sign-Language Input
要旨

As voice-based personal assistant technologies proliferate, e.g., smart speakers in homes, and more generally as voice-control of technology becomes increasingly ubiquitous, new accessibility barriers are emerging for many Deaf and Hard of Hearing (DHH) users. Progress in sign-language recognition may enable devices to respond to sign-language commands and potentially mitigate these barriers, but research is needed to understand how DHH users would interact with these devices and what commands they would issue. In this work, we directly engage with the DHH community, using a Wizard-of-Oz prototype that appears to understand American Sign Language (ASL) commands. Our analysis of video recordings of DHH participants revealed how they woke-up the device to initiate commands, structured commands in ASL, and responded to device errors, providing guidance to future designers and researchers. We share our dataset of over 1400 commands, which may be of interest to sign-language-recognition researchers.

著者
Abraham Glasser
Rochester Institute of Technology, Rochester, New York, United States
Matthew Watkins
Rochester Institute of Technology, Rochester, New York, United States
Kira Hart
Rochester Institute of Technology, Rochester, New York, United States
Sooyeon Lee
Rochester Institute of Technology, Rochester, New York, United States
Matt Huenerfauth
Rochester Institute of Technology, Rochester, New York, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3501987

動画
Polite or Direct? Conversation Design with Politeness Theory on a Smart Display for Older Adults
要旨

Conversational interfaces increasingly rely on human-like dialogue to offer a natural experience. However, relying on dialogue involving multiple exchanges for even simple tasks can overburden users, particularly older adults. In this paper, we explored the use of politeness theory in conversation design to alleviate this burden and improve user experience. To achieve this goal, we categorized the voice interaction offered by a smart display application designed for older adults into seven major speech acts: request, suggest, instruct, comment, welcome, farewell, and repair. We identified face needs for each speech act, applied politeness strategies that best address these needs, and tested the ability of these strategies to shape the perceived politeness of a voice assistant in an online study ($n=64$). Based on the findings of this study, we designed \textit{direct} and \textit{polite} versions of the system and conducted a field study ($n=15$) in which participants used each of the versions for five days at their homes. Based on five factors merged from our qualitative findings, we identified four distinctive user personas---\textit{socially oriented follower}, \textit{socially oriented leader}, \textit{utility oriented follower}, and \textit{utility oriented leader}---that can inform personalized design of smart displays.

著者
Yaxin Hu
University of Wisconsin-Madison, Madison, Wisconsin, United States
Yuxiao Qu
University of Wisconsin-Madison, Madison, Wisconsin, United States
Adam Maus
University of Wisconsin - Madison, Madison, Wisconsin, United States
Bilge Mutlu
University of Wisconsin-Madison, Madison, Wisconsin, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3517525

動画