Conversational agents (CAs) have the great potential in mitigating the clinicians' burden in screening for neurocognitive disorders among older adults. It is important, therefore, to develop CAs that can be engaging, to elicit conversational speech input from older adult participants for supporting assessment of cognitive abilities. As an initial step, this paper presents research in developing the backchanneling ability in CAs in the form of a verbal response to engage the speaker. We analyzed 246 conversations of cognitive assessments between older adults and human assessors, and derived the categories of reactive backchannels (e.g. ``hmm”) and proactive backchannels (e.g. ``please keep going”). This is used in the development of TalkTive, a CA which can predict both timing and form of backchanneling during cognitive assessments. The study then invited 36 older adult participants to evaluate the backchanneling feature. Results show that proactive backchanneling is more appreciated by participants than reactive backchanneling.
https://dl.acm.org/doi/abs/10.1145/3491102.3502005
Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements.
https://dl.acm.org/doi/abs/10.1145/3491102.3502020
As voice-based personal assistant technologies proliferate, e.g., smart speakers in homes, and more generally as voice-control of technology becomes increasingly ubiquitous, new accessibility barriers are emerging for many Deaf and Hard of Hearing (DHH) users. Progress in sign-language recognition may enable devices to respond to sign-language commands and potentially mitigate these barriers, but research is needed to understand how DHH users would interact with these devices and what commands they would issue. In this work, we directly engage with the DHH community, using a Wizard-of-Oz prototype that appears to understand American Sign Language (ASL) commands. Our analysis of video recordings of DHH participants revealed how they woke-up the device to initiate commands, structured commands in ASL, and responded to device errors, providing guidance to future designers and researchers. We share our dataset of over 1400 commands, which may be of interest to sign-language-recognition researchers.
https://dl.acm.org/doi/abs/10.1145/3491102.3501987
Conversational interfaces increasingly rely on human-like dialogue to offer a natural experience. However, relying on dialogue involving multiple exchanges for even simple tasks can overburden users, particularly older adults. In this paper, we explored the use of politeness theory in conversation design to alleviate this burden and improve user experience. To achieve this goal, we categorized the voice interaction offered by a smart display application designed for older adults into seven major speech acts: request, suggest, instruct, comment, welcome, farewell, and repair. We identified face needs for each speech act, applied politeness strategies that best address these needs, and tested the ability of these strategies to shape the perceived politeness of a voice assistant in an online study ($n=64$). Based on the findings of this study, we designed \textit{direct} and \textit{polite} versions of the system and conducted a field study ($n=15$) in which participants used each of the versions for five days at their homes. Based on five factors merged from our qualitative findings, we identified four distinctive user personas---\textit{socially oriented follower}, \textit{socially oriented leader}, \textit{utility oriented follower}, and \textit{utility oriented leader}---that can inform personalized design of smart displays.
https://dl.acm.org/doi/abs/10.1145/3491102.3517525