Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users

要旨

Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (DHH) individuals to understand if and how captions' inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 DHH participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.

著者
Caluã de Lacerda Pataca
Rochester Institute of Technology, Rochester, New York, United States
Matthew P. Watkins
Rochester Institute of Technology, Rochester, New York, United States
Roshan L. Peiris
Rochester Institute of Technology, Rochester, New York, United States
Sooyeon Lee
New Jersey Institute of Technology, Newark, New Jersey, United States
Matt Huenerfauth
Rochester Institute of Technology, Rochester, New York, United States
論文URL

https://doi.org/10.1145/3544548.3581511

動画

会議: CHI 2023

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)

セッション: Visualization and Data

Room Y03+Y04
6 件の発表
2023-04-25 20:10:00
2023-04-25 21:35:00