Audio descriptions (AD) make videos accessible to those who cannot see them. But many videos lack AD and remain inaccessible as traditional approaches involve expensive professional production. We aim to lower production costs by involving novices in this process. We present an AD authoring system that supports novices to write scene descriptions (SD)—textual descriptions of video scenes—and convert them into AD via text-to-speech. The system combines video scene recognition and natural language processing to review novice-written SD and feeds back what to mention automatically. To assess the effectiveness of this automatic feedback in supporting novices, we recruited 60 participants to author SD with no feedback, human feedback, and automatic feedback. Our study shows that automatic feedback improves SD's descriptiveness, objectiveness, and learning quality, without affecting qualities like sufficiency and clarity. Though human feedback remains more effective, automatic feedback can reduce production costs by 45%.
https://doi.org/10.1145/3544548.3581023
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)