Captions help deaf and hard-of-hearing (DHH) individuals visually communicate voice information to better understand video content. In speech, the literal content and paralinguistic cues (e.g., pitch and nuance) work together to create real intention. However, current captions are limited in their capacity to deliver fine nuances because they cannot fully convey these paralinguistic cues. This paper proposes an audio-visualized caption system that automatically visualizes paralinguistic cues into various caption elements (thickness, height, font type and motion). A comparative study with 20 DHH participants demonstrates how our system supports DHH individuals to be better accessible to paralinguistic cues while watching videos. Particularly in the case of formal talks, they could accurately identify the speaker’s nuance more often compared to current captions, without any practice or training. Addressing some issues on legibility and familiarity, the proposed caption system has potentials to enrich DHH individuals’ video watching experience more as hearing people enjoy.
https://doi.org/10.1145/3544548.3581130
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)