Assistive Interactions: Audio Interactions and Deaf and Hard of Hearing Users

会議の名前
CHI 2024
Audio Engineering by People Who Are deaf and Hard of Hearing: Balancing Confidence and Limitations
要旨

With technological advancements, audio engineering has evolved from a domain exclusive to professionals to one open to amateurs. However, research is limited on the accessibility of audio engineering, particularly for deaf, Deaf, and hard of hearing (DHH) individuals. To bridge this gap, we interviewed eight deaf and hard of hearing (dHH) audio engineers in music to understand accessibility in audio engineering. We found that their hearing magnified challenges in audio engineering: insecurities in sound perception undermined their confidence, and the required extra ``hearing work'' added complexity. As workarounds, participants employed various technologies and techniques, relied on the support of hearing peers, and developed strategies for learning and growth. Through these practices, they navigate audio engineering while balancing confidence and limitations. For future directions, we recommend exploring technologies that reduce insecurities and ``hearing work'' to empower DHH audio engineers and working toward a DHH-community-driven approach to accessible audio engineering.

著者
Keita Ohshiro
New Jersey Institute of Technology, Newark, New Jersey, United States
Mark Cartwright
New Jersey Institute of Technology, Newark, New Jersey, United States
論文URL

doi.org/10.1145/3613904.3642454

動画
Look Once to Hear: Target Speech Hearing with Noisy Examples
要旨

In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence.

受賞
Honorable Mention
著者
Bandhav Veluri
University of Washington, SEATTLE, Washington, United States
Malek Itani
University of Washington, Seattle, Washington, United States
Tuochao Chen
Computer Science and Engineering, Seattle, Washington, United States
Takuya Yoshioka
IEEE, Redmond, Washington, United States
Shyamnath Gollakota
university of Washington, Seattle, Washington, United States
論文URL

doi.org/10.1145/3613904.3642057

動画
Communication, Collaboration, and Coordination in a Co-located Shared Augmented Reality Game: Perspectives From Deaf and Hard of Hearing People
要旨

Co-located collaborative shared augmented reality (CS-AR) environments have gained considerable research attention, mainly focusing on design, implementation, accuracy, and usability. Yet, a gap persists in our understanding regarding the accessibility and inclusivity of such environments for diverse user groups, such as deaf and Hard of Hearing (DHH) people. To investigate this domain, we used Urban Legends, a multiplayer game in a co-located CS-AR setting. We conducted a user study followed by one-on-one interviews with 17 DHH participants. Our findings revealed the usage of multimodal communication (verbal and non-verbal) before and during the game, impacting the amount of collaboration among participants and how their coordination with AR components, their surroundings, and other participants improved throughout the rounds. We utilize our data to propose design enhancements, including onscreen visuals and speech-to-text transcription, centered on participant perspectives and our analysis.

著者
Sanzida Mojib Luna
Rochester Institute of Technology, Rochester, New York, United States
Jiangnan Xu
Rochester Institute of Technology, West Henrietta, New York, United States
Konstantinos Papangelis
Rochester Institute of Technology, Rochester, New York, United States
Garreth W.. Tigwell
Rochester Institute of Technology, Rochester, New York, United States
Nicolas LaLone
Rochester Institute of Technology, Rochester, New York, United States
Michael Saker
City University London, London, United Kingdom
Alan Chamberlain
University of Nottingham, Nottingham, United Kingdom
Samuli Laato
Tampere University , Tampere, Finland
John Dunham
Rochester Institute of Technology, Rochester, New York, United States
Yihong Wang
Xi'an Jiaotong-Liverpool University, Suzhou, China
論文URL

doi.org/10.1145/3613904.3642953

動画
"Voices Help Correlate Signs and Words": Analyzing Deaf and Hard-of-Hearing (DHH) TikTokers’ Content, Practices, and Pitfalls
要旨

Video-sharing platforms such as TikTok have offered new opportunities for d/Deaf and hard-of-hearing (DHH) people to create public-facing content using sign language -- an integral part of DHH culture. Besides sign language, DHH creators deal with a variety of modalities when creating videos, such as captions and audio. However, hardly any work has comprehensively addressed DHH creators' multimodal practices with the lay public's reactions taken into account. In this paper, we systematically analyzed 308 DHH-authored TikTok videos using a mixed-methods approach, focusing on DHH TikTokers' content, practices, pitfalls, and viewer engagement. Our findings highlight that while voice features such as synchronous voices are scant and challenging for DHH TikTokers, they may help promote viewer engagement. Other empirical findings, including the distributions of topics, practices, pitfalls, and their correlations with viewer engagement, further lead to actionable suggestions for DHH TikTokers and video-sharing platforms.

著者
Jiaxun Cao
Duke Kunshan University, Kunshan, Jiangsu, China
Xuening Peng
University of Florida, Gainesville, Florida, United States
Fan Liang
Duke Kunshan University, Kunshan, China
Xin Tong
Duke Kunshan University, Kunshan, Suzhou, China
論文URL

doi.org/10.1145/3613904.3642413

動画