With technological advancements, audio engineering has evolved from a domain exclusive to professionals to one open to amateurs. However, research is limited on the accessibility of audio engineering, particularly for deaf, Deaf, and hard of hearing (DHH) individuals. To bridge this gap, we interviewed eight deaf and hard of hearing (dHH) audio engineers in music to understand accessibility in audio engineering. We found that their hearing magnified challenges in audio engineering: insecurities in sound perception undermined their confidence, and the required extra ``hearing work'' added complexity. As workarounds, participants employed various technologies and techniques, relied on the support of hearing peers, and developed strategies for learning and growth. Through these practices, they navigate audio engineering while balancing confidence and limitations. For future directions, we recommend exploring technologies that reduce insecurities and ``hearing work'' to empower DHH audio engineers and working toward a DHH-community-driven approach to accessible audio engineering.
https://doi.org/10.1145/3613904.3642454
In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence.
https://doi.org/10.1145/3613904.3642057
Co-located collaborative shared augmented reality (CS-AR) environments have gained considerable research attention, mainly focusing on design, implementation, accuracy, and usability. Yet, a gap persists in our understanding regarding the accessibility and inclusivity of such environments for diverse user groups, such as deaf and Hard of Hearing (DHH) people. To investigate this domain, we used Urban Legends, a multiplayer game in a co-located CS-AR setting. We conducted a user study followed by one-on-one interviews with 17 DHH participants. Our findings revealed the usage of multimodal communication (verbal and non-verbal) before and during the game, impacting the amount of collaboration among participants and how their coordination with AR components, their surroundings, and other participants improved throughout the rounds. We utilize our data to propose design enhancements, including onscreen visuals and speech-to-text transcription, centered on participant perspectives and our analysis.
https://doi.org/10.1145/3613904.3642953
Video-sharing platforms such as TikTok have offered new opportunities for d/Deaf and hard-of-hearing (DHH) people to create public-facing content using sign language -- an integral part of DHH culture. Besides sign language, DHH creators deal with a variety of modalities when creating videos, such as captions and audio. However, hardly any work has comprehensively addressed DHH creators' multimodal practices with the lay public's reactions taken into account. In this paper, we systematically analyzed 308 DHH-authored TikTok videos using a mixed-methods approach, focusing on DHH TikTokers' content, practices, pitfalls, and viewer engagement. Our findings highlight that while voice features such as synchronous voices are scant and challenging for DHH TikTokers, they may help promote viewer engagement. Other empirical findings, including the distributions of topics, practices, pitfalls, and their correlations with viewer engagement, further lead to actionable suggestions for DHH TikTokers and video-sharing platforms.
https://doi.org/10.1145/3613904.3642413