Toward Automatic Audio Description Generation for Accessible Videos

要旨

Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating high-quality audio descriptions requires a lot of manual description generation. To address this accessibility obstacle, we built a system that analyzes the audiovisual contents of a video and generates the audio descriptions. The system consisted of three modules: AD insertion time prediction, AD generation, and AD optimization. We evaluated the quality of our system on five types of videos by conducting qualitative studies with 20 sighted users and 12 users who were blind or visually impaired. Our findings revealed how audio description preferences varied with user types and video types. Based on our study's analysis, we provided recommendations for the development of future audio description generation technologies.

著者
Yujia Wang
Beijing Institute of Technology, Beijing, China
Wei Liang
Beijing Institute of Technology, Beijing, China
Haikun Huang
George Mason University, Fairfax, Virginia, United States
Yongqi Zhang
University of Massachusetts Boston, Boston, Massachusetts, United States
Dingzeyu Li
Adobe Research, Seattle, Washington, United States
Lap-Fai Yu
George Mason University, Fairfax, Virginia, United States
DOI

10.1145/3411764.3445347

論文URL

https://doi.org/10.1145/3411764.3445347

動画

会議: CHI 2021

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2021.acm.org/)

セッション: Accessible Content Creation

[A] Paper Room 01, 2021-05-10 17:00:00~2021-05-10 19:00:00 / [B] Paper Room 01, 2021-05-11 01:00:00~2021-05-11 03:00:00 / [C] Paper Room 01, 2021-05-11 09:00:00~2021-05-11 11:00:00
Paper Room 01
11 件の発表
2021-05-10 17:00:00
2021-05-10 19:00:00
日本語まとめ
読み込み中…