SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

要旨

Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

著者
Zheng Ning
University of Notre Dame, Notre Dame, Indiana, United States
Brianna L. Wimer
University of Notre Dame, South Bend, Indiana, United States
Kaiwen Jiang
Beijing Jiaotong University, Beijing, China
Keyi Chen
University of California San Diego, San Diego, California, United States
Jerrick Ban
University of Notre Dame, Notre Dame, Indiana, United States
Yapeng Tian
University of Texas at Dallas, Richardson, Texas, United States
Yuhang Zhao
University of Wisconsin-Madison, Madison, Wisconsin, United States
Toby Jia-Jun. Li
University of Notre Dame, Notre Dame, Indiana, United States
論文URL

doi.org/10.1145/3613904.3642632

動画

会議: CHI 2024

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)

セッション: Supporting Accessibility of Text, Image and Video B

313B
5 件の発表
2024-05-14 23:00:00
2024-05-15 00:20:00