This paper presents a mixed methods study on how deaf, hard of hearing and hearing viewers perceive live TV caption quality with captioned video stimuli designed to mirror TV captioning experiences. To assess caption quality, we used four commonly-used quality metrics focusing on accuracy: word error rate, weighted word error rate, automated caption evaluation (ACE), and its successor ACE2. We calculated the correlation between the four quality metrics and viewer ratings for subjective quality and found that the correlation was weak, revealing that other factors besides accuracy affect user ratings. Additionally, even high-quality captions are perceived to have problems, despite controlling for confounding factors. Qualitative analysis of viewer comments revealed three major factors affecting their experience: Errors within captions, difficulty in following captions, and caption appearance. The findings raise questions as to how objective caption quality metrics can be reconciled with the user experience across a diverse spectrum of viewers.
https://doi.org/10.1145/3613904.3641988
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)