Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

要旨

Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app “in-the-wild” study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.

著者
Chitralekha Gupta
National University of Singapore, Singapore, Singapore
Jing Peng
National University of Singapore, Singapore, Singapore
Ashwin Ram
Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Shreyas Sridhar
National University of Singapore, Singapore, Singapore, Singapore
Christophe Jouffrais
CNRS, Toulouse, France
Suranga Nanayakkara
School of Computing, National University of Singapore, Singapore, Singapore

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Captioning, Description, and Media Interaction

P1 - Room 120
7 件の発表
2026-04-16 18:00:00
2026-04-16 19:30:00