Sound, Music, and Dance Accessibility

会議の名前
CHI 2026
FAME: Exploring Expressive Facial Avatars for Lyrical and Non-Lyrical Music Visualization for d/Deaf Individuals
要旨

d/Deaf and Hard of Hearing (DHH) individuals often engage with music through a multimodal approach, where visual modalities are also used rather than relying on sound alone. While tools like captions and visualizers offer partial support, they often fail to capture the emotional depth and structural nuances of music. To explore new possibilities, we adopted an iterative, probe-based approach. Through a formative study with 9 DHH participants, we identified key design requirements for visualizing rhythm, emotion, and lyrics. We developed FAME (Facial Avatar for Musical Expression), a design probe that conveys music through expressive facial animation, instrument highlights, and synchronized captions, lip-syncing to lyrics or scat-singing to melodies. Through a two-phase exploratory study with 12 DHH users, we examined FAME’s efficacy, applicability, and requirements for representing musical elements. Our findings refine design requirements for avatar-based systems and highlight the potential of avatars as expressive and socially meaningful tools for music accessibility.

著者
Suhyeon Yoo
University of Toronto, Toronto, Ontario, Canada
Yifang Pan
University of Toronto, Toronto, Ontario, Canada
Ashish Ajin Thomas
University of Toronto, Toronto, Ontario, Canada
Karan Singh
University of Toronto, Toronto, Ontario, Canada
Khai N.. Truong
University of Toronto, Toronto, Ontario, Canada
Designing a Generative AI-Assisted Music Psychotherapy Tool for Deaf and Hard-of-Hearing Individuals
要旨

Songwriting has long served as a powerful medium for expressing unconscious emotions and fostering self-awareness in psychotherapy. Due to the auditory-centric nature of traditional approaches, Deaf and Hard-of-Hearing (DHH) individuals have often been excluded from music’s therapeutic benefits. In response, this study presents a music psychotherapy tool co-designed with therapists, integrating conversational agents (CAs) and music generative AI as symbolic and therapeutic media. Through a usage study with 23 DHH individuals, we found that collaborative songwriting with the CA enabled them to experience emotional release, re-interpretation, and deeper self-understanding. In particular, the CA’s strategies—supportive empathy, example response options, and visual-based metaphors—were found to facilitate musical dialogue effectively for DHH individuals. These findings contribute to inclusive AI design by showing the potential of human–AI collaboration to bridge therapeutic and artistic practices.

著者
Youjin Choi
Gwangju Institute of Science and Technology, Gwangju, Buk-gu, Korea, Republic of
JaeYoung Moon
Gwangju Institute of Science and Technology, Gwangju, Buk-gu, Korea, Republic of
JinYoung Yoo
Gwangju Institute of Science and Technology, Gwangju , Korea, Republic of
Jennifer G. Kim
Georgia Institute of Technology, Atlanta, Georgia, United States
Jin-Hyuk Hong
Gwangju Institute of Science and Technology, Gwangju, Korea, Republic of
A Sound Understanding --- An In-Situ Deployment of an Accessible Audio-Media Player with People Living with Aphasia
要旨

Audio media -- radio, podcasts, audiobooks -- structures everyday life: we keep up, wind down, and share moments through long-form listening. Yet for people living with aphasia -- a communication disability that affects audio comprehension -- unsupported audio often means losing the thread and marring the experience. While accessibility advances have focused on print, web, and audiovisual content, audio-only remains unconsidered; oftentimes optimised for marketisation rather than sustained understanding. We report a three-week in-situ deployment of Re-Connect app, an audio media player which meets the people at the moment of comprehension difficulty. With ten adults living with aphasia, we show how people assemble personal repertoires of small, co-present communication cues that repair in the moment and support recall. Grounded in lived experience, we argue for personal, source-proximate scaffolds that help make long-form audio more understandable and enjoyable.

著者
Filip Bircanin
King's College London , London, United Kingdom
Alexandre Nevsky
King's College London, London, United Kingdom
Madeline N. Cruice
City St George's, University of London, London, London, United Kingdom
Ognjen Markovic
Aparteko, Belgrade, Serbia and Montenegro
Timothy Neate
King's College London, London, United Kingdom
CLARIS: Clear and Intelligible Speech from Whispered and Dysarthric Voices
要旨

Whispered and dysarthric speech hinder effective communication and undermine the reliability of voice-enabled systems. We present CLARIS, a compact speech-to-speech restoration system that turns such atypical input into clear, expressive speech. CLARIS requires no disorder-specific architectural tuning, generalizes across languages, and adapts quickly to new accents and speakers, enabling practical personalization. On whispered English, Hindi, and clinically challenging dysarthric speech, CLARIS delivers state-of-the-art intelligibility and naturalness, with listener studies confirming gains in quality, intelligibility, naturalness, and prosody. The system runs in real time, converting one second of input in about 30ms and enables inclusive, private, and personalized voice interaction. Audio samples are available at https://claris-w2s.github.io/CLARIS/

著者
Neil Shah
TCS Research, Pune, Maharashtra, India
Yash Sonkar
CVIT, IIIT Hyderabad, Hyderabad, Telangana, India
Shirish Subhash. Karande
TCS Research, Pune, Maharashtra, India
Vineet Gandhi
IIIT Hyderabad, Hyderabad, India
Co-Designing Multimodal Systems for Accessible Asynchronous Dance Instruction
要旨

Videos make exercise instruction widely available, but they rely on visual demonstrations that blind and low vision (BLV) learners cannot see. While audio descriptions (AD) can make videos accessible, describing movements remains challenging as the AD must convey what to do (mechanics, location, orientation) and how to do it (speed, fluidity, timing). Prior work thus used multimodal instruction to support BLV learners with individual simple movements. However, it is unclear how these approaches scale to dance instruction with unique, complex movements and precise timing constraints. To inform accessible remote dance instruction systems, we conducted three co-design workshops (N=28) with BLV dancers, instructors, and experts in sound, haptics, and AD. Participants designed 8 systems revealing common themes: staged learning to dissect routines, crafting vocabularies for movements, and selectively using modalities—narration for movement structure, sound for expression, and haptics for spatial cues. We conclude with design implications to make learning dance accessible.

著者
Ujjaini Das
University of Texas, Austin, Austin, Texas, United States
Shreya Kappala
University of Texas at Austin, Austin, Texas, United States
Meng Chen
University of California, Berkeley, Berkeley, California, United States
Mina Huh
University of Texas, Austin, Austin, Texas, United States
Amy Pavel
University of California, Berkeley, Berkeley, California, United States
MYOLINK esports: Exploring EMG-based Control Interface Through Muscle Activation and Inhibition to Enable Common Gameplay Mechanics among Players with and without Physical Disabilities
要旨

Esports have highlighted both their potential for social inclusion and the accessibility challenges faced by individuals with physical disabilities. This study introduces a novel paradigm for inclusive esports by shifting game control from traditional kinematic inputs to kinetic inputs. An EMG-based control interface enables gameplay through force regulation (e.g., muscle activation and inhibition). The aim is to explore how this interface enables common gameplay mechanics among players with and without physical disabilities. User Study 1 involved 20 able-bodied participants performing competitive esports tasks to examine how EMG-based control accuracy is influenced by movement range, such as wrist and elbow motion. User Study 2 extended the investigation to eight participants with physical disabilities to compare control accuracy between disabled and able-bodied users. The findings suggest that the interface enables common gameplay mechanics for individuals who can separately control activation and inhibition of each muscle corresponding to each EMG sensor via calibration adjustment but disability-related involuntary muscle activity and unintended co-contraction remains a major challenge for the interface.

著者
Masato Shindo
NTT, Inc., Yokosuka, Kanagawa, Japan
Shiina Takano
NTT Human Informatics Laboratories, NTT Corporation, Yokosuka, Japan
Shuto Sako
Nihon University, Tokyo, Japan
Akihiro Miyata
Nihon University, Tokyo, Japan
Ryosuke Aoki
NTT, Inc., Yokosuka-shi Kanagawa, Japan
動画
SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users
要旨

People who are blind or have low-vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation assistance, those supporting pre-travel assistance typically provide information about only landmarks and turn-by-turn instructions, lacking detailed visual context. Street level imagery, which contains rich visual information and has the potential to reveal environmental details, remains inaccessible to BLV people. In this work, we present SceneScout, a multimodal large language model (MLLM)-driven prototype that enables accessible interactions with street level imagery. SceneScout supports two modes: (1) Route Preview, enabling users to familiarize themselves with visual details along a route, and (2) Virtual Exploration, enabling user-driven movement within street level imagery. Our user study demonstrates that SceneScout helps BLV users uncover visual information otherwise unavailable through existing means. An initial analysis of AI-generated descriptions suggests that majority are accurate and describe stable visual elements even in older imagery, though occasional subtle and plausible errors make them difficult to verify without sight. We discuss future opportunities and challenges of street level imagery-based navigation experiences.

著者
Gaurav Jain
Columbia University, New York, New York, United States
Leah Findlater
Apple, Cupertino, California, United States
Cole Gleason
Apple Inc., Seattle, Washington, United States